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Executive  Summary 


Introduction: 

This  is  the  final  report  on  the  work  done  under  contract  DASG-60-92-C-(X)55  from  Phillips 
Labs  and  ARPA  to  the  Department  of  Computer  Science  at  the  University  of  Maryland. 
The  work  started  04/28/92.  The  goal  of  ftis  project  was  to  create  an  environment  for 
development  and  deployment  of  critical  applications  with  hard  real-time  constraints  in  a 
reactive  environment .  We  have  redesigned  Marati  system  to  address  these  issues.  In  this 
report  we  highlight  the  achievements  of  this  contract.  A  publications  list  and  a  copy  of  each 
of  the  publications  is  also  attached. 

Application  Development  Environment: 

To  support  applications  in  a  real-time  system,  conventional  application  development 
techniques  and  tools  must  be  augmented  with  support  for  specification  and  extraction  of 
resource  requirements  and  timing  constraints,  The  application  development  system 
provides  a  set  of  programming  tools  to  support  and  facilitate  the  development  of  real-time 
applications  with  diverse  requirements.  The  Maruti  Programming  Language  (WL)  is  used 
to  develop  induvidual  pro^am  modules.  The  Maruti  Configuration  Language  (MCL)  is 
used  to  specify  how  individual  program  modules  are  to  be  connected  together  to  form  an 
application  and  the  details  of  the  hardware  of  which  the  application  is  to  be  executed. 

In  the  current  version,  the  base  programming  language  used  is  ANSI  C.  MPL  adds 
modules,  shared  memory  blocks,  critical  regions,  typed  message  passing,  periodic 
functions,  and  message-invoked  functions  to  the  C  language.  To  make  analyzing  the 
resource  usage  of  programs  feasible,  certain  C  idioms  are  not  allowed  in  MPL;  in 
particular,  recursive  function  calls  are  not  allowed  nor  are  unbounded  loops  containing 
externally  visible  events,  such  as  message  passing  and  critical  region  transition. 

MPL  Modules  are  brought  together  into  as  an  executable  application  by  a  specification  file 
written  in  the  Marati  Configuration  Lan^ge  (MCL).  The  MCL  specification  determines 
the  application’s  hard  real-time  constraints,  the  allocation  of  tasks,  threads,  and  shared 
memory  blocks,  and  all  message-passing  connections.  MCL  is  an  interpreted  C-like 
language  rather  than  a  declarative  lan^age,  allowing  the  instantiation  of  complicated 
subsystems  using  loops  and  subroutines  in  Ae  specification. 


Analysis  and  Resource  Allocations: 

The  basic  building  block  of  the  Marati  computation  model  is  the  elemental  unit  (EU).  In 
general  an  elemental  unit  is  an  executable  entity  which  is  triggered  by  incoming  data  and 
signals,  operates  on  the  input  data,  and  produces  some  output  data  and  signals.  The 
behavior  of  an  EU  is  atomic  with  respect  to  its  environment.  Specifically: 

•  AU  resources  needed  by  an  elemental  unit  are  assumed  to  be  required  for  the  entire 
length  of  its  execution. 

•  The  interaction  of  ^  EU  with  other  entities  of  the  system  occurs  either  before  it  starts 
executing  or  after  it  finishes  execution. 
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In  order  to  define  complex  executions ,  the  EUs  may  be  composed  together  and  properties 
specified  on  the  composition.  Elemental  units  are  composed  by  connecting  an  output  port 
of  an  EU  with  an  input  port  of  another  EU.  A  valid  coimection  requires  that  the  input  and 
output  of  port  types  are  compatible,  i.e.,  they  carry  the  same  message  type.  Such  a 
connection  marks  a  one-way  flow  of  data  or  control,  depending  on  the  nature  of  the  ports. 
A  composition  of  EUs  can  be  viewed  as  a  directed  acychc  graph,  called  an  elemental  unit 
graph  ^UG),  in  which  the  nodes  are  the  EUs,  and  the  edges  are  the  connections  between 
EUs.  An  incompletely  specified  EUG  in  which  all  input  and  output  ports  are  not  coimected 
is  termed  as  a  partial  EUG  (PEUG).  A  partial  EUG  may  be  viewed  as  a  higher  level  EU. 
In  a  complete  EUG,  aU  input  and  output  ports  are  connected  and  there  are  no  cycles  in  the 
graph.  TTie  acyclic  requirements  come  from  the  required  time  determinacy  of  execution.  A 
program  with  unbounded  cycles  or  recursions  may  not  have  a  temporally  determinate 
execution  time.  Bounded  cycles  in  an  EUG  are  converted  into  a  acyclic  graph  by  loop 
unrolling. 

Program  modules  are  independently  compiled.  In  addition  to  the  generation  of  the  object 
code,  compilation  also  results  in  the  creation  of  partial  EUGs  for  the  modules,  i.e.,  for  the 
services  and  entries  in  the  module,  as  well  as  the  extraction  of  resource  requirements  such 
as  stack  sizes  or  threads,  memory  requirements,  and  the  logical  resource  requirements. 

Given  an  application  specification  in  the  Marati  Configuration  Language  and  the  component 
application  modules,  the  integration  tools  are  responsible  for  creating  a  complete  application 
program  and  extracting  out  the  resource  and  timing  information  for  scheduling  and 
resource  allocation.  TTie  input  of  the  integration  process  are  the  program  modules,  the 
partial  EUGs  corresponding  to  the  modules,  the  application  configuration  specification,  and 
the  hardware  specifications.  The  outputs  of  the  integration  process  are:  a  specification  for 
the  loader  for  creating  tasks,  populating  their  address  space,  creating  the  threads  and 
channels,  and  initializing  the  task;  loadable  executables  of  the  program;  and  the  complete 
application  EUG  along  with  the  resource  description  for  the  resource  allocation  and  the 
scheduling  subsystem. 

After  the  application  program  has  been  analyzed  and  its  resource  requirements  and 
execution  constraints  identified,  it  can  be  allocated  and  scheduled  for  a  runtime  system. 

We  consider  the  static  allocation  and  scheduling  in  which  a  task  is  the  finest  granularity 
object  of  allocation  and  an  EU  instance  is  tiie  unit  of  scheduling.  In  order  to  make  the 
execution  of  instances  satisfy  the  specification  and  meet  the  timing  constraints,  we  consider 
a  scheduling  frame  whose  length  is  the  least  common  multiple  of  all  tasks’  periods.  As 
long  as  one  instance  of  each  EU  is  scheduled  in  each  period  within  the  scheduling  frame 
and  these  executions  meet  the  timing  constraints,  a  feasible  schedule  is  obtained. 


Maruti  Runtime  System: 

The  runtime  system  provides  the  conventional  functionality  of  an  operating  system  in  a 
manner  that  supports  the  timely  dispatching  of  jobs.  There  are  two  major  components  of 
the  runtime  system  -  the  Maruti  core,  which  is  the  operating  system  code  that  implements 
scheduling,  message  passing,  process  control,  thread  control,  and  low  level  hardware 
control,  and  the  runtime  dispatcher,  which  performs  resource  allocation  and  scheduling  or 
dynamic  arrivals. 
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The  core  of  the  Maruti  hard  real-time  runtime  system  consists  of  three  data  structures: 


•  The  calendars  are  created  and  loaded  by  the  dispatcher.  Kernel  memory  is  reserved  for 
each  calendar  at  the  time  it  is  created.  Several  system  calls  serve  to  create,  delete, 
modify,  activate,  and  deactivate  calendars. 


•  The  results  table  holds  timing  and  status  results  for  the  execution  of  each  elemental 
unit;  The  maruti_cdandar_results  system  call  reports  these  results  back  up  to  the  user 
level,  usually  the  dispatcher.  The  dispatcher  can  then  keep  statistics  or  write  a  trace 
file. 


•  'Yh.t  pending  activation  table  holds  aU  outstanding  calendar  activation  and  deactivation 
requests.  Since  the  requests  can  come  from  before  the  switch  time,  the  kernel  must 
track  the  requests  and  execute  them  at  the  correct  time  in  the  correct  order. 

The  M^ti  design  includes  the  concept  of  scenarios,  implemented  at  runtime  as  sets  of 
alternative  calendars  that  can  be  switched  quickly  to  handle  an  emergency  or  a  change  in 
operating  mode.  These  calendars  are  pre-scheduled  and  able  to  begin  execution  without 
having  to  invoke  any  user-level  machinery.  The  dispatcher  loads  the  initial  scenarios 
specified  by  the  application  and  activates  one  of  them  to  begin  normal  execution. 
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Optimal  ReplicatioB  of  SP  Graphs  for  Compiitation-Iiiteiisive  Applications 


Prof.  Ashok  K.  Agrawala 
Department  of  Computer  Science 
University  of  Maryland  at  College  Paxk 
A.V.  Williams  Building 
CoDege  Park,  Maryland  20742 
(Tel):  (301)  405-2525 

Abstract 

We  consider  the  replication  problem  of  series-parallel  (SP)  task  graphs  where  each  task  may 
run  on  more  than  one  processor.  The  objective  of  the  problem  is  to  minimbe  the  total  cost 
of  task  execution  and  interprocessor  communication.  We  call  it,  the  minimum  cost  rtpHcaiion 
problem  for  SP  graphs  (MCRP-SP).  In  this  paper,  we  adopt  a  new  communication  model  where 
the  purpose  of  replication  is  to  reduce  the  total  cost.  The  class  of  applications  we  consider 
is  computation-intensive  applications  in  which  the  execution  cost  of  a  task  is  greater  than 
its  communication  cost.  The  complexity'  of  MCRP-SP  for  such  applications  is  proved  to  be 
NP-complete.  We  present  a  branch- an d-bound  method  to  find  an  optimal  solution  as  well  as 
an  approximation  approach  for  suboptimal  solution.  The  numerical  results  show  that  such 
replication  may  lead  to  a  lower  cost  than  the  optimal  assignment  problem  (in  which  each  task 
is  assigned  to  only  one  processor)  does.  The  proposed  optimal  solution  has  the  complexity'  of 
0(n^2”  Af ),  while  the  approximation  solution  has  O(n^Af^),  where  n  is  the  number  of  processors 
in  the  system  and  M  is  the  number  of  tasks  in  the  graph. 


2 


1  Introduction 


Distributed  computer  systems  have  often  resulted  in  improved  reliability,  flexibility,  throughput, 
fault  tolerance  and  resource  sharing.  In  order  to  use  the  processors  available  in  a  distributed 
system,  the  tasks  have  to  be  allocated  to  the  processors.  The  allocation  problem  is  one  of  the 
basic  problems  of  distributed  computing  whose  solution  has  a  far  reaching  impact  on  the  usability 
and  efficiency  of  a  distributed  system.  Clearly,  the  tasks  of  an  application  have  to  be  executed 
satisfying  the  precedence  and  other  sjmchronization  constraints  among  them.  (Such  constraints  are 
often  specified  in  the  form  of  a  task  graph.) 

In  executing  an  application,  defined  by  its  task  graph,  we  have  the  option  of  restricting  ourselves 
to  having  only  one  copy  of  each  task.  The  allocation  problem,  in  this  case,  is  referred  to  as 
assignment  problem.  If,  on  the  other  hand,  a  task  may  be  replicated  multiple  times,  the  general 
problem  is  called  the  replication  problem.  In  this  paper,  we  consider  the  replication  problem  and 
present  an  algorithm  to  find  the  optimal  replication  of  series-parallel  graphs  for  computation¬ 
intensive  applications. 

For  distributed  processing  applications,  the  objective  of  the  allocation  problem  may  be  the 
minimum  completion  time,  processor  load  balancing,  or  total  cost  of  execution  and  communication, 
etc.  For  the  assignment  problem  where  the  objective  is  to  minimize  the  total  cost  of  execution  and 
interprocessor  communication,  Stone  [11]  and  Towsley  [12]  presented  0{ri^M)  algorithms  for  tree- 
structure  and  series-parallel  graphs,  respectively,  of  M  tasks  and  n  processors.  For  general  task 
graphs,  the  assignment  problem  has  been  proven  [9]  to  be  NP-complete.  Many  papers  [8][9][10] 
presented  branch-and-bound  methods  which  yielded  an  optimal  result.  Other  heuristic  methods 
have  been  considered  by  Lo  [7]  and  Price  and  Krishnaprasad  [5].  All  these  works  focused  on  the 
assignment  problem. 

Traditionally,  the  main  purpose  of  replicating  a  task  on  multiple  processors  is  to  increase  the 
degree  of  fault  tolerance  [2][6].  If  some  processors  in  the  distributed  system  fail,  the  application  may 
still  survive  using  other  copies.  In  such  a  commumcation  model,  a  task  has  to  ff>TTiTT>iiT<i rate  with 
multiple  copies  of  other  tasks.  As  a  consequence,  the  total  cost  of  execution  and  communication 
of  the  replication  problem  will  be  bigger  than  that  of  the  assignment  problem.  In  this  paper,  we 
adopt  another  communication  model  in  which  the  replication  of  a  task  is  not  for  the  sake  of  fault 
tolerance  but  for  decreasing  of  the  total  cost.  In  our  model,  each  task  may  have  more  than  one  copv 
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and  it  may  start  its  execution  after  receiving  necessary  data  from  one  copy  of  each  preceding  task. 
Clearly,  in  a  heterogeneous  environment  the  cost  of  execution  of  a  task  depends  on  the  processor  on 
which  it  executes,  and  the  communication  costs  depend  on  the  topology,  communication  medium, 
protocols  used,  etc.  When  a  task  i  is  allowed  to  have  only  one  copy  in  the  system,  the  sum 
of  the  interprocessor  communication  costs  between  i  and  other  tasks  may  be  large.  Sometimes 
it  will  be  more  benefidal  if  we  replicate  i  onto  multiple  processors  to  reduce  the  inter-processor 
communication,  and  to  fully  utilize  the  available  processors  in  the  systems.  Such  replication  may 
lead  to  a  lower  total  cost  than  the  optimal  assignment  problem  does.  An  example  illustrating  this 
point  is  presented  in  Section  3. 

In  the  assignment  problem,  polynomial- time  algorithms  exist  for  special  cases,  such  as  tree- 
structure  [1 1]  and  series-parallel  [12]  task  graphs.  This  paper  represents  one  of  the  first  few  attempts 
at  finding  special  cases  for  the  replication  problem.  The  class  of  applications  we  consider  in  this 
paper  is  computation-intensive  applications  in  which  the  execution  cost  of  a  task  is  greater  than  its 
communication  cost.  Such  applications  can  be  found  in  an  enormous  number  of  fields,  sudi  as  digital 
signal  processing,  weather  forecasting,  game  searching,  etc.  We  formally  define  a  computation¬ 
intensive  application  in  Section  2.2.  In  this  paper,  we  prove  that  for  the  computation-intensive 
applications,  the  replication  problem  is  NP-complete,  and  we  present  a  branch-and-bound  algorithm 
to  solve  it.  The  worst-case  complexity  of  the  solution  is  0{n^2''M).  Note  that  the  algorithm  is 
able  to  solve  the  problem  in  the  complexity  of  the  linear  function  of  M. 

We  also  develop  an  approximation  approach  to  solve  the  problem  in  polynomial  time.  Given  a 
forker  task  s  with  K  successors  in  the  SP  graph,  the  method  tries  to  allocate  s  to  processors  based 
on  iterative  selection.  The  complexity  of  the  iterative  sdection  for  a  forker  is  while  the 

overall  solution  for  an  SP  graph  is  0{n*M^). 

In  the  remainder  of  this  paper,  the  series- parallel  graph  model  and  the  computation  model  are 
described  in  section  2.  In  section  3,  the  replication  problem  is  formulated  as  the  noinimum  cost 
0-1  integer  programming  problem  and  the  proof  of  NP  completeness  is  given.  A  branch-and-bound 
algorithm  and  numerical  results  are  given  in  section  4,  while  the  approximation  methods  and  results 
are  given  in  section  5.  The  overall  algorithm  is  presented  and  conclusion  remark  is  drawn  in  section 
6. 
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2 


Definitions 


2.1  Graph  Model 


A  ierUs-pamlld  (SP)  graph,  a  =  (V,£),is  a  directed  graph  of  type  y,  where  p  €  {r„,„ 

T.ni,  T„)  and  G  has  a  scarce  code  (of  indegree  0)  and  a  sink  node  (of  ontdegree  0).  An  SP  graph 
can  be  constructed  by  applying  the  following  rules  recursively. 


1.  A  graph  G  -  {V,E)  -  ({r),  </>)  is  an  SP  graph  of  type  (Node  u  is  the  source  and  the 
sink  of  G.) 


2.  If  Ga  -  {V^,E^)  and  Gj  =  are  SP  graphs  then  G'  =  (V',  E')  is  an  SP  graph  of  type 

Tchain,  where  V' =  Vi  u  V2  ajid  E' =  Ei  U  E2  U  {<sink  of  Gi,  source  of  Gj  >}. 

3.  If  each  graph  G,-  =  (K-,Ei)  with  source-sink  pair  where  s;  is  of  outdegree  1,  is  an  SP 

graph,  V  :  =  1,2,. .  .,n,  and  new  nodes  s'  ^  K'  and  t'  i  Vi,  V  i  are  given  then  G'  =  {V\  E’)  is 
an  SP  graph  of  type  TandCor  type  T„),  where  V'  =  Vj  u  Vj  U  . .  .u  V„  U  {s',  f')  asid  E’  =  Ey 
U  £2  U  . . .  U  U  {<  s',s.-  >  I  V  i  =  1,2,. . .,n  }  u  {<  t„t'  >  |  V  i  =  1,2,. .  .,n  }.  The  source 
of  G',  s',  is  called  the  forker  of  G'.  The  sink  of  G',  t',  is  called  the  yomcr  of  G'.  G'  is  an  SP 

graph  of  tj-pe  TanrfCor  type  Tor)  if  there  exists  a  pamllel-and  (or  pamllel-or)  relation  among 

Gi's. 


A  convenient  way  of  representing  the  structure  of  an  SP  graph  is  via  a  parsing  tree  [4].  The 
transformation  of  an  SP  graph  to  a  parsing  tree  can  be  done  in  a  recursive  way.  There  are  four 
kmds  of  internal  nodes  in  a  parsing  tree:  and  Tor  nodes.  A  node  has  only 

one  child,  while  a  ToHoin  node  has  more  than  one  child.  Every  internal  node  x,  along  with  all  its 
descendant  nodes  induces  a  subtree  5*  which  describes  an  SP  subgraph  G.  of  G.  Each  leaf  node 
in  Ss  corresponds  to  an  SP  graph  of  type  T^n.  A  Tand{oT  Tor)  node  y  consists  of  its  type  Tandipj 
Tor)  along  with  the  forker  and  joiner  nodes  of  G^.  We  give  an  example  of  an  SP  graph  G,  and  its 
parsing  tree  r(G)  in  Figure  1. 
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2.2  Computational  Model 

An  application  program  consists  of  M  tasks  labeled  m  =  1.  2, . . . ,  M.  Its  behavior  is  represented 
by  an  SP  graph  with  the  tasks  correspond  to  the  nodes.  Each  task  may  be  replicated  onto  more 
than  one  processor.  A  task  instance  is  a  replication  of  task  t  on  processor  p.  A  directed  edge  <  i, 
j  >  between  nodes  i  and  j  exists  if  the  execution  of  task  j  follows  that  of  task  i.  Associated  with 
each  edge  <  i,  j  >  is  the  communication  cost  incurred  by  the  application.  We  are  concerned  with 
types  of  applications  where  the  cost  of  execution  of  a  task  is  always  greater  than  the  communication 
overhead  it  needs.  The  model  is  stated  as  follows. 

Given  a  distributed  system  S  with  n  processors  connected  by  a  communication  network,  an 
application  is  computation-intensive  if  its  assodated  SP  graph  G  =  (V,  E)  on  S  satisfies  the 
following  conditions: 

1-  >  0, 

2-  EP=:i  9)  <  minp(ci,p),  V  <  ij  >€  i^,  and  1  <  p  <  n,  where 

•  ?)  is  the  conomuni cation  cost  between  tasks  i  and  j  when  they  are  assigned  to  processors 
p  and  5  respectively,  and 

•  ei,p  is  the  execution  cost  when  task  i  is  assigned  to  processor  p. 

The  first  condition  states  that  the  communication  cost  between  any  two  task  instances  (e.g. 
and  tj,,)  is  not  negative.  The  second  one  depicts  that  for  every  edge  <  i,j  >,  the  worst-case 
communication  cost  between  any  task  instance  tij,  and  all  its  successor  task  instances  (i.e.  V 
q)  is  less  than  the  minimum  execution  cost  of  task  i. 

2.3  Communication  Model 

The  communication  modd  we  considered  is  different  from  that  of  rdiability-oriented  replication. 
In  reliability-oriented  replication  problem,,  the  objective  is  to  increase  the  degree  of  fault  tolerance. 
To  detect  fault  and  maintain  data  consistency,  each  task  has  to  receive  multiple  copies  of  data  from 
several  teisk  instances  if  its  predecessor  is  replicated  in  more  than  one  place. 
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The  purpose  of  the  replication  problem  considered  in  this  paper  is  to  decrease  the  sum  of 
execution  and  communication  costs.  Under  such  consideration,  there  is  no  need  to  enforce  pluraJ 
communication  between  any  two  task  instances.  Hence,  we  propose  the  J-out-oJ-n  communication 
model.  In  the  model,  for  each  edge  <  i,  j>€E,2L  task  instance  t,- ,  may  start  its  execution  if  it 
receives  the  data  from  any  one  task  insttince  of  its  predecessor,  task  t. 


3  Problem  Formulation  and  Complexity 


Based  on  the  computational  model  presented  in  Section  2.2,  the  problem  of  minimizing  the  total 
sum  of  execution  and  communication  costs  for  an  SP  task  graph  can  be  approached  by  replication 
of  tasks.  An  example  where  the  replication  may  lead  to  a  lower  sum  of  execution  costs  and 
communication  costs  is  given  in  Figure  2,  where  the  number  of  processors  in  the  system  is  two,  and 
the  execution  costs  and  communication  costs  are  listed  in  e  table  and  fi  table  respectively.  If  each 
task  is  allowed  to  run  on  at  most  one  processor,  then'  the  optimal  allocation  will  be  to  assign  task 
a  to  processor  1,  i>  to  1,  c  to  1,  d  to  2,  e  to  2,  and  /  to  1.  The  minimum  cost  is  68.  However,  if 
each  task  is  allowed  to  be  repheated  more  than  one  copies,  (i.e.  to  replicate  task  c  to  processors  1 
and  2),  then  the  cost  is  67. 

We  introduce  integer  variable  V  1  <  *  <  M  and  1  <  p  <  n,  to  formulate  the  problem 

where  each  A’i.p  =  1  if  task  i  is  replicated  on  processor  p;  and  =  0,  otherwise.  We  define  a  binary 
function  Six).  If  r  >  0  then  ^(x)  =  1  else  «(x)  =  0.  We  also  associate  an  allocated  flag  Fiw)  with 
each  node  w  in  the  parsing  tree,  where  Fiw)  =  1  if  the  allocation  for  tasks  in  the  subtree  5«,  is 
^alid;  and  =  0,  otherwise.  A  valid  allocation  for  the  tasks  in  S«,  is  an  allocation  that  follows  the 
semantics  of  Tchain,  Toni,  and  Ter  subgraphs.  A  valid  allocation  is  not  necessarily  the  allocation  in 
which  each  task  in  is  allocated  to  at  least  one  processor.  Some  tasks  in  Ter  subgraphs  may  be 
neglected  without  effecting  the  successful  execution  of  an  SP  graph. 

Given  an  SP  graph  G,  its  parsing  tree  r(G')  and  any  internal  node  w  in  r(G),  allocated  flag 
J’(m)  can  be  recursively  computed: 
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1.  if  ti)  is  a  Tunit  node  with  a  child  then 


/•(«.)= no  = 

»>=’ 

2.  if  ti)  is  a  Tchain  node  with  c  children,  F{w)  =  F(childi)  x  /’(cht/dj)  x  . . .  x  F{ckildc). 

3.  if  u-  is  a  Tand  node  with  forker  s,  joiner  i  and  c  children,  then  F{w)  =  F(s)  x  F{i)  x  F{child-i) 
X  Fichild^)  X  ...  X  F{childe). 

4.  if  tu  is  a  2V  node  with  forker  s,  joiner  i  and  c  children,  then  F(ty)  =  F{s)  x  F(t)  x  S{F{childi) 
+  Fichildi)  +  . . .  +  F{childc)). 

The  minimum  cost  replication  problem  for  SP  graphs,  MCRP-SP,  can  be  formulated  as  0-1 
integer  programming  problem,  i.e: 

Z  =  Minimize  Xij,  *ei,p+  J (P>  ?)  *  ) ) 

subject  to  F(r)  =  1,  where  r  is  the  root  of  T{G)  and  Xi^p  =  0  or  1,  Vt,p.  (1) 

The  restricted  problem  which  allows  each  task  to  run  on  at  most  one  processor  has  the  following 
formulation. 

Z  =  Minimize  []^  Xi,p  *  c.>  +  ^  mj  *  Xi,p  »  A'j.,  ] 

».P  <»0>€£,P,9 

n 

subject  to  ^  Xi,p  <  1  and  F{r)  =  1, 

p=i 

where  r  is  the  root  of  T{G)  and  Xi^p  =  0  or  l,Vx,p.  (2) 

The  task  assignment  problem  (2)  for  SP  graphs  of  M  tasks  onto  n  processors,  has  been  solved 
in  0{t?M)  time  [12].  However, the  multiprocessor  task  assignment  for  general  types  of  task  graphs 
without  replication  has  been  reported  to  be  NP-complete  [9].  As  for  the  MCRP-SP  problem,  it 
can  be  shoxxm  to  be  NP-complete.  In  this  paper,  we  are  able  to  solve  the  problem  and  present  a 
linear-time  algorithm  that  is  linear  in  the  number  of  tasks  when  the  number  of  processors  is  fixed 
for  computation-intensive  applications. 
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3.1  Assignment  Graph 


Bokhajj  [1]  introduced  the  assignment  graph  to  solve  the  task  assignment  problem  (2).  To  prove 
the  NP  completeness  of  problem  (1)  and  solve  the  problem,  we  also  adopt  the  concept  of  the 
assignment  graph  of  an  SP  graph.  The  assignment  graph  of  an  SP  graph  can  be  defined  similarly. 
The  following  definitions  apply  to  the  assignment  graph.  And  we  draw  up  an  assignment  graph  for 
an  SP  graph  in  Figure  3. 

1.  It  is  a  directed  graph  with  weighted  nodes  and  edges. 

2.  It  has  Jlf  X  n  nodes.  Each  weighted  node  is  labeled  with  a  task  instance,  t;  p. 

3.  A  layer  i  is  the  coUection  of  n  weighted  nodes  ...,  and  Each  layer  of  the 

graph  corresponds  to  a  node  in  the  SP  graph.  The  layer  corresponding  to  the  source  (sink) 
is  called  source  (sink)  layer, 

4.  A  part  of  the  assignment  graph  corresponds  to  an  SP  subgraph  of  type  Tck^in,  Tand  or  is 
called  a  TcAoi,,,  Tend  or  Tar  limb  respectively, 

5.  Communication  costs  are  accounted  for  by  giving  the  waght  q)  to  the  edge  going  from 

to  , 

6.  Execution  costs  are  assigoed  to  tlie  corresponding  weighted  nodes. 

Given  an  assignment  graph,  Bokhaxi  [1]  solves  Problem  (2)  by  selecting  one  weighted  node 
from  each  layer  and  including  the  weighted  edges  between  any  two  selected  nodes.  This  resulting 
subgraph  is  called  an  allocation  graph.  To  solve  Problem  (1),  more  than  one  weighted  node  from 
each  layer  may  be  chosen.  Similarly,  a  replication  graph  for  Problem  (1)  can  be  constructed  from 
an  assignment  graph  by  including  all  selected  nodes  and  edges  between  these  nodes.  Examples  of 
an  allocation  graph  and  a  replication  graph  are  shown  in  Figure  4  for  an  assignment  graph  shown 

in  Figure  3.  Note  that  for  each  node  x  in  the  replication  graph  there  is  only  one  edge  incident  to 
it  from  each  predecessor  layer  of  x. 

In  a  replication  graph,  each  layer  may  have  more  than  one  selected  node.  Let  Variable  Xi 
=  (^i,i,  ^1,2,  A'/.n)  be  a  replication  vector  for  layer  /  in  a  replication  graph.  We  define  the 
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miiumum  aciivaiion  cost  of  vector  Xi  for  layer  i  ,  to  be  the  nuniinum  sum  of  the  weights 

of  all  possible  nodes  and  edges  leading  to  the  selected  nodes  of  layer  i  in  a  replication  graph. 
Then  the  goal  of  Problem  (1)  can  be  achieved  by  computing  the  minimal  value  of  + 

over  all  possible  values  of  X^. 


3.2  Complexity 

In  this  section,  we  can  show  that  Problem  (1)  for  a  computation-intensive  application  is  NP- 
complete  provided  we  prove  the  following: 

Lemma  1:  For  any  layer  I  in  the  replication  graph,  the  minimum  activation  cost  for  two  selected 
nodes  and  t/,,  will  be  always  greater  than  that  for  either  node  1/j,  or  1;,,  only. 

Proof;  The  Lemma  can  be  proven  by  contradiction.  Let  Ai  be  the  the  minimum  activation  cost  for 
two  nodes  ti,p  and  t;,, ,  and  and  As  be  the  minimum  costs  for  and  respectivelj-.  Assume 
that  Ai  <  As  and  Ai  <  A3.  Since  Ai  includes  the  activation  cost  of  node  t/,^,  an  activation  cost 
for  only  can  be  obtained  from  Aj.  The  obtained  value  c  is  not  necessarily  the  minimum  value 
for  hence  As  <  c.  The  value  c  is  obtained  by  removing  some  weighted  nodes  and  edges  from 
replication  graph.  This  implies  that  c  <  Aj.  Rom  above,  we  find  that  As  <  Aj,  which  contradicts 
the  assumption.  The  same  reasoning  can  be  applied  to  A3  and  reaches  a  contradiction.  Therefore, 
the  assumptions  are  incorrect  and  Lemma  1  holds. 


□ 

Lemma  1  can  be  further  extended  to  the  cases  where  more  than  two  weighted  nodes  axe  chosen. 
The  conclusion  we  can  draw  is  that  the  more  nodes  are  sdected  from  a  layer,  the  bigger  the 
activation  cost  is. 

Lemma  2:  Given  a  computation-intensive  application  with  its  SP  task  graph  G  =  (V,  £)  and  its 
assignment  graph,  if  node  i  has  outdegree  one  and  edge  <  i,j  >  £  £.,  then  for  any  vector  Xj,  the 
minimal  activation  cost  Aj{Xj)  can  be  obtained  by  choosing  only  one  weighted  node  from  layer 
(i.e.  Ep=,A',>  =  l) 

Proof:  The  Lemma  can  be  proven  by  contradiction.  Since  node  i  has  outdegree  one  and  edge 
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<  (ATj- 5  *  q))  =  m. 

P=1 


The  result,  m’  <  m,  contradicts  our  assumption.  It  means  that  the  assumption  is  wrong  and 
Lemma  2  holds. 


□ 

Lemma  3:  Given  a  computation-intensive  application  with  its  SP  task  graph  G,  the  objective  of 
the  minimum  cost  can  be  achieved  by  considering  only  the  replication  of  the  forkers. 

Proof:  We  proceed  to  prove  the  lemma  by  contradiction.  Let  the  noinimum  cost  for  task  replication 
problem  be  zo  if  only  the  forkers(i.e.  outdegree  >  1)  are  allowed  to  run  on  more  than  one  processor. 
Assume  the  total  cost  can  be  reduced  further  by  replicating  some  task  i  which  is  not  a  forker.  Then 
there  are  two  possible  cases  for  i: 

1.  i  has  outdegree  0. 

2.  i  has  outdegree  1. 

In  case  1,  i  :s  the  sink  of  the  whole  graph.  Also  j  may  be  the  joiner  of  some  SP  subgraphs.  If  j  is 
allowed  to  run  on  an  extra  processor  b,  which  is  different  ffom  the  one  which  i  is  initially  assigned 
to  (when  zo  is  obtained),  then  the  new  cost  will  be  zb  +  c.-.^  +  Z<d,i>^E  Apparently,  the  new 

cos:  is  greater  than  z©.  This  contradicts  our  assumption  that  the  total  cost  can  be  reduced  further 
by  replicating  task  i. 

In  case  2,  i  has  one  successor.  Let  <  ij  >  £  £.  From  the  assumption,  we  know  that  the 
replication  of  i  ^  reduce  the  total  cost.  Bence,  the  minimum  activation  cost  for  task  instances 
in  layer  j,  is  obtained  when  task  i  is  replicated  onto  more  than  one  processor.  This 

contradicts  Lemma  2.  Hence,  the  assumption  is  incorrect  and  the  objective  of  the  minimum  cost 
can  be  achieved  by  considering  only  the  replication  of  the  forkers. 


D 

Lemma  3  tells  that,  given  an  SP  graph,  if  we  can  find  out  the  optimal  replication  for  the  forkers. 
Problem  (1)  for  computation-intensive  applications  can  be  solved.  Now,  we  show  that  the  problem 
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<  i,j  >  6  -S’,  we  know  that 


A;,p  *  c,-.p  +  5^  min  (Aj,5  *  /x, .,•(?,  g))}. 

’  p=i  9=1 

Let  ns  assume  that  the  above  equation  reaches  a  minimal  value  m  when  more  than  one  node 
from  layer  :  is  selected  and  the  optimal  replication  vector  is  Xf.  Since  A',-,p  >  1  for  A°,  we 
may  remove  one  selected  node  from  layer  i  and  obtain  a  new  vector  A,-*  Without  loss  of  generality, 
let  us  remove  1v,t-  B}'  removing  node  a-  new  value  m'  is  obtained.  Since  m  is  the  minimum 
value  for  layer  it  implies  that  m  <  m'. 

Trom  Lemma  1,  we  obtain  that  A;(X-)  <  Ai{Xf).  And  for  a  computation-intensive  application, 
the  following  holds  that  fJ-{Ap,q)  <  minp(e,.p),  V  1  <  p  <  n.  Then, 


7*  71 

m'  =  A{{Xl)  -r  52  Kp  *  +  12  j(P,  ?)) 

r>=l  9=1 

<  M^i) + 5!^  A-^p  *  Ci^p  -}•  52  (Aj.5  *  Mi j(p,  5)) 

p=i  9=: 

<  A,-(Af)-!- (^A°p*ei.p  -  mn  (Aj,,  .Mij(p,?)) 

p=i  9=1 

=  Ai( A?)  -h  ^  A?p  .  Ci,  +  152  inin  (A,-,,  .  m»' j(P,  ?))]  -  Ci.r 

<  A{(Xi) + + C  j(p.  9))}  - 

9=1  ^ 

<  + f:  + E  ipiL  (X,;, .  ft j(p,  5)))  -  E  5) 

^=1  9=l-*»>~^  9=1 

<  A,(X'f)+£x,'^.ei, 

7=1 
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<  *  «»>  +  (^'j.9  *  Mij(p,  g))  =  m. 


The  result,  m  <  m,  contradicts  our  assumption.  It  means  that  the  assumption  is  wrong  and 
Lemma  2  holds. 


□ 

Lemma  3:  Given  a  computation-intensive  application  with  its  SP  task  graph  G,  the  objective  of 
the  minimum  cost  can  be  achieved  by  considering  only  the  replication  of  the  forkers. 

Proof.  We  proceed  to  prove  the  lemma  by  contradiction.  Let  the  miTiiTTniTn  cost  for  task  replication 
problem  be  zq  if  only  the  forkers(i.e.  outdegree  >1)  are  allowed  to  run  on  more  than  one  processor. 
Assume  the  total  cost  can  be  reduced  further  by  repH eating  some  task  i  which  is  not  a  forker.  Then 
there  are  two  possible  cases  for  i: 

1.  2  has  outdegree  0. 

2.  2  has  outdegree  1. 

In  case  1,  i  is  the  sink  of  the  whole  graph.  Also  i  may  be  the  joiner  of  some  SP  subgraphs.  If  i  is 
allowed  to  run  on  an  extra  processor  b,  which  is  different  from  the  one  which  i  is  initiallv  assigned 
to  (when  20  is  obtained),  then  the  new  cost  will  be  20  +  +  T.<d.i>eE  Apparently,  the  new 

cost  is  greater  than  20.  This  contradicts  our  assumption  that  the  total  cost  be  reduced  further 
by  replicating  task  i. 

In  case  2,  2  has  one  successor.  Let  <  i^j  >  £  E.  From  the  assumption,  we  know  that  the 
replication  of  i  can  reduce  the  total  cost.  Bence,  the  minimum  activation  cost  for  task  instances 
in  layer  j,  is  obtained  w^hen  task  i  is  replicated  onto  more  than  one  processor.  This 

contradicts  Lemma  2.  Bence,  the  assumption  is  incorrect  and  the  objective  of  the  mi’niTnnTTt  cost 
can  be  achieved  by  considering  only  the  replication  of  the  forkers. 

□ 

Lemma  3  tells  that,  given  an  SP  graph,  if  we  can  find  out  the  optimal  replication  for  the  forkers. 
Problem  (1)  for  computation-intensive  applications  can  be  solved.  Now,  we  show’  that  the  problem 


13 


of  finding  an  optima]  replication  for  the  forkers  in  an  SP  graph  is  NP-complete.  First,  a  special 
form  of  the  replication  problem  is  introduced. 

Uni-Cost  Task  Replication  (UCTR)  problem  is  stated  as  foDows: 

INSTANCE:  Graph  G‘  =  {V\E'),  V‘  =  V{  U  Vj'.  I  V/  ]  =  n  and  1 1^'  |  =  m.  If  i  €  V/  and 
y  €  then  edge  <  i,y  >  £  E'  (i.e.  |  |  =  m  x  n).  For  each  x  6  there  is  an  activation  cost 
m.  Associated  with  each  edge  <  x,y  >  £  E\  there  is  a  communication  cost  =  n  x  m  or  0.  A 
positive  integer  K  <  n  x  m  is  also  pven. 

QUESTION:  Is  there  a  feasible  subset  T4  C  such  that,  we  have 

(  ^  m  +  x;  mD(4.y) )  <  K1  (3) 

=^eV*  y€Vj'*^  * 

[Theorem  l]:  Uni-Cost  Task  Replication  problem  is  NP-Complete. 

[Proof]:  The  problem  is  in  NP  because  a  subset  14,  if  it  exists,  can  be  checked  to  see  if  the  sum 
of  activation  costs  and  communication  costs  is  less  than  or  equal  to  K .  We  shall  now  transform 
the  \^RTEX  COVER  [3]  problem  to  this  problem.  Given  any  graph  C  =  (V,E)  and  an  integer  C 
<  I  V*  I,  we  shall  construct  a  new  graph  G'  =  (V',E')  and  V'  =  1^  u  l^',  such  that  there  exists  a 
VERTEX  COVER  of  size  C  or  less  in  G  if  and  only  if  there  is  a  feasible  subset  of  Vj  in  G'.  Let 
I  V  I  =  n  and  ]  E  j  =  m.  To  construct  G'^  (1)  we  create  a  vertex  »,•  for  each  node  in  V,  (2)  we 
number  the  edges  in  E,  and  (3)  we  create  a  vertex  bj  for  each  edge  <  ti,r  >  €  E  where  u,  r  €  V. 
We  define  E  =  m  x  C,  V/  =  {ui,  vj, t;„),  =  {hj,  hj, i>«)  and  E'  =  {<  >  1  tv  € 

Vj',  by  €  14  }•  =  0,  if  f-  is  an  end  point  of  the  corresponding  edge  of  vertex  by;  and  = 

nxm,  otherwise.  An  illustration,  where  n  =  7  and  m  =  9,  is  shown  in  Figure  5. 

Let  us  now  argue  that  there  e^sts  a  vertex  cover  of  size  C  or  less  in  G  if  and  only  if  there  is 
a  feasible  subset  of  Vj*  in  G'  to  satisfy  that  the  sum  of  activation  cost  and  communication  cost  is 
mxC  OT  less.  Suppose  there  is  a  vertex  cover  of  size  C,  then  for  each  vertex  by  (=  <  t:,u  >)  in  V2, 
at  least  one  of  v  and  v  belongs  to  the  vertex  cover.  By  selecting  all  the  vertices  in  the  vertex  cover 
into  the  subset  of  V{,  we  know  that  the  sum  in  Eq.  (3)  will  be  m  x  C.  Since  C  <  n,  it  implies  that 
m  X  C  <  R  X  m. 

Conversely,  for  any  feasible  subset  T4  C  V/  such  that  the  total  cost  is  equal  to  or  less  than 
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mC,  we  can  see  that  the  second  term  of  Eq.  (3)  (i.e.  the  sum  of  communication  cost)  must  be 
zero.  Suppose,  for  some  €  Vj,  the  minimum  communication  cost  between  py  and  vertices  in  V 
is  nonzero,  then  the  communication  cost  will  be  at  least  m  x  n.  Since  C  <  n,  it  implies  that  mxn 
>mxC.  The  total  cost  in  Eq.  (3)  will  be  greater  than  mxC,  which  is  a  contradiction.  Thus  the 
minimum  communication  cost  between  any  vertex  in  V'  and  any  vertex  in  V*  is  zero.  It  means  that 
at  least  one  of  two  end  points  of  each  edge  in  E  belongs  to  V*.  Since,  there  is  at  most  C  vertices  in 
Vk  (the  activation  cost  for  each  vertex  is  m),  and  by  selecting  the  vertices  in  14,  we  obtain  a  vertex 
cover  of  size  C  or  less  in  G. 

O 

(Theorem  2j;  The  problem,  MCRP-SP  for  computation-iniensive  applications,  is  NP-complete. 
[Proofj:  From  Lemma  3,  we  know  that  only  the  forker  in  an  SP  graph  of  type  Tand  needs  to  run  on 

more  than  one  processor.  Consider  the  following  recognition  version  of  Problem  (1)  for  SP  graphs 
of  type 

Given  a  distributed  system  of  n  processors,  an  SP  graph  =  (F%r“)  of  tj-pe  Tend,  its 
assignment  graph  E  and  two  positive  integers  m  and  r.  Let  r  be  a  multiple  of  m,  V*  =  {s.  t, 
1,2,... ,r}  and  =  {<  s,i  >  |  i  =  l,2,...,r}  U  {<  i,t  >  |  i  =  1,2,... ,r}.  Task  s  (t)  is  the  forker 
(joiner)  of  G  .  Execution  cost  e,'^  and  communication  cost  /ijj(p, 5)  axe  defined  in  5^,  V  <  ij  > 
€  and  V  1  <  p,9  <  n.  bteger  variable  Xi^  =  1  iS  task  i  is  assigned  to  processor  p;  and  =  0, 
otherwise.  When  a  positive  integer  £  <  r  h  ^ven,  is  there  an  assignment  ofXi^'s,  such  that 

I  E  +  Z  “i?,  (m, j(p,  ?)  *  Xj,,)  ]  <  if? 

Where  ^Xi^  =  1,  Vi  #  s,  and  >  1,  if  i  =  s.  (4) 

*> 

We  shall  transform  the  UCTR  problem  to  this  problem.  Given  any  graph  G’  =  (T^'  U  V^yE’) 
considered  in  UCTR  problem,  we  construct  an  SP  graph  of  type  T^ni,  G®  =  (V®,E“),  and  its 
assignment  graph  £ ,  such  that  G'  has  a  feasible  subset  of  Vf  to  allow  the  sum  in  Eq.  (3)  is  K  or 
less  if  and  only  if  there  is  an  assignment  of  Xi/s  for  G“  and  E  to  satisfy  Eq.  (4).  Let  !  Vj'  (  = 
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I  Vj  I  =  771,  then  the  unit  cost  /  =  n  x  m.  Assign  T  =  mxJ{=nx  m?)  and  K  =  n  x  m.  The 
forker  and  joiner  of  axe  s  and  1  respectively.  Then  V“  =  {s,  ..,r)  and  jP“  =  {<  s,i  >  ]  i 

=  1,2,. .  .,r}  U  ■{<  i,t  >  1  i  =  1,2,. .  .,t).  We  assign  the  execution  costs  and  communication  costs  in 
H  as  follows.  An  illustration,  where  m  =  2  and  n  =  3,  is  shown  in  Figure  7. 

•  ^  1  <  P  ^  ^s,p  = 

•  Vl<7<r,  Vl<p<77,  ifpssl  then  e.-^,  =  0  else  6,-,^  =  r. 

•  Vl<p<7X,  ifp=l  then  et,p  —  0  else  et,p  =  r. 

•  V  1  <  7  <  r,  V  1  <  p  <  n,  let  5  =  (:  —  1)  div  (777  x  rz),  where  div  is  the  integral  division.  If 

r  0  then  fi^,{{p,  1)  =  1  else  =  0. 

•  Vl<t<r,Vl<p<7i,  ¥9541,  fis,{{p,q)  =  0. 

•  V  1  <  j  <  T,  V  1  <  p,g  <  71,  g)  =  0. 

It  is  easy  to  verify  that  the  SP  graph  constructed  by  the  the  above  rules  is  of  type  Tand  and 
computation-intensive.  For  each  node  in  Vj  of  G\  we  create  /  nodes  in  (?®,  where  the  communica¬ 
tion  cost  between  each  node  and  source  s  is  ether  one  or  zero. 

Let  us  now  argue  that  there  exists  a  feasible  subset  of  for  UCTR  problem  if  and  only  if  there 
exists  a  valid  assignment  of  such  that  the  total  sum  in  £q.  (4)  is  If  or  less.  Suppose  a  feasible 
subset  14  of  Ij'  exists  such  that  the  sum  in  Eq.  (3)  is  C  {<  K) .  Let  be  {t)i,t»2,. . .,fn}  Then  we 
can  obtain  a  valid  assignment  by  letting  X,-.i  =  1,  A',-,2  =  0, . . .,  A,-,„  =  0,  V  1  <  t  <  r,  and  A-j  = 
1,  =  0,  . . .,  A't,„  =  0,  and  Xs,p  =  1,  if  Up  €  V*;  and  =  0,  if  Vj,  ^  14,  V  1  <  p  <  ti.  Since 

each  node  z  in  14'  corresponds  to  /  nodes  in  it  is  sure  that  the  communication  cost  between 
node  z  and  any  node  (vp)  in  Vj  is  equal  to  the  total  communication  costs  between  these  /  nodes 
and  any  task  instance  of  source  (1,,^)  in  G°.  By  summing  up  all  the  costs,  we  can  obtain  that  the 
total  sum  is  C.  Since  C^K^nxmCr,  this  is  a  valid  assignment. 

Conversely,  if  there  exists  an  assignment  of  such  that  the  sum  in  £q.  (4)  is  K  ox  less, 
then  the  following  must  be  true  that  Ai.i  —  1,  A;,2  5=  0, . . .,  Ai,n  =  0,  V  1  <  t  <  r,  and  Aj.^  =  1, 
A:,2  =  0, . . .,  At,„  =  0.  It  is  because  for  some  p  5^  1,  if  A,\p  =  1  then  the  sum  must  be  greater  than 
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r,  which  causes  a  conflict.  Hence  the  second  term  in  Eq.  (4)  must  be  zero.  Thus,  we  may  obtain  a 
subset  of  Vj  for  HCTR  problem  by  selecting  node  i  €  Vj  if  equals  1.  Since  the  first  term  in 

Eq.  (3)  is  equivalent  to  the  first  term  in  Eq.  (4),  the  total  sum  for  IJCTR  problem  will  be  also  K 
or  less  then. 


□ 


4  Optimal  Replication  for  SP  Graphs  of  lype  Tand 

In  this  section,  we  develop  the  branch-and-bound  algorithm  to  find  an  optimal  solution  for 
subgraphs.  The  non-forker  nodes  only  need  to  run  on  one  processor.  Hence,  an  optimal  assignment 
of  non-forker  nodes  can  be  done  after  an  optimal  replication  for  forkers  is  obtained. 

4.1  A  Branch-and-Bound  Method  for  Optimal  Replication 

Consider  a  Tonrf  SP  graph  with  forker-joiner  pair  (s,fl)  shown  in  Figure  6.  There  are  JB  subgraphs 
connected  by  s  and  h.  These  B  subgraplis  have  a  paraHel-and  relationship.  Since  the  joiner  h  has 
only  one  copy  in  optimal  solution  (i.e.  ~  1)>  '^6  decompose  the  minimum  cost  replication 

problem  "P  for  a  SP  graph  into  n  subproblems  'P®,  5  =  1,  2,  . . .,  u,  ■where  is  to  find  the 
minimum  cost  when  the  joiner  is  assigned  to  processor  q  (i.e.  =  1). 

Given  a  joiner  instance  subgraphs  G^’s,  b  =  1,  2,  H,  and  the  minimum  costs  C^^'s 
between  each  forker  instance  t,,  and  joiner  instance  Vl<p<naadl<i)<F.  we  further 
decompose  problem  into  n  subproblems  k  —  1, 2, . . u,  where  k  is  the  number  of  replicated 
copies  that  the  forker  s  has.  Basically;  Vl  means  the  problem  of  finding  an  optimal  replication  for 
k  copies  of  forker  s  where  the  joiner  K  is  assigned  to  processor  q.  Since  the  problem  of  finding  an 
optimal  replication  for  forker  s  is  NP-complete,  we  propose  a  branch-and-bound  algorithm  for  each 
subproblem  Vl- 

We  sort  the  forker  instances  according  to  their  execution  costs  e,^’s  into  non-decreasing  order. 
Without  loss  of  generality,  we  assume  c,,i  <  e,.2  <  ...<  e,,„.  We  represent  all  the  possible 
combinations  that  s  may  be  replicated  by  a  combination  tree  with  (J)  leaf  nodes.  To  make  the 
solution  efficient,  we  shall  not  consider  all  combinations  since  it  is  time-consuming.  We  apply  a 
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least-cost  branci-ajod-bound  algorithm  to  fiad  aii  optimal  solution  by  traversing  a  small  portion  of 
the  combination  tree. 

During  the  search,  we  maintain  a  variable  i  to  record  the  minimum  value  known  so  far.  The 
search  is  done  by  the  expansion  of  intermediate  nodes.  Bach  intermediate  node  v  at  level  y  repre¬ 
sents  a  combination  of  y  out  of  n  forker  instances.  The  expansion  of  node  v  generates  at  most  n  —  y 
child  nodes,  while  each  child  node  inherits  y  forker  instances  from  v  and  adds  one  distinct  forker 
instance  to  itself.  For  example,  if  node  v  is  represented  by  -<  t,.,, ,  1,,;^  >-,  where  <  (7 

<  ...<  iy,  then  -<  tj.;, ,  t,,,-,,  ...,  >-  represents  a  possible  child  node  of  v,  V  1  <  ;  < 

71  — iy.  A  combination  tree,  where  k  =  4  and  n  =  6,  is  shown  in  Figure  8.  At  any  intermediate  node 
of  a  combination  tree,  we  apply  an  estimation  function  to  compute  the  least  cost  this  node  can 
achieve.  If  the  estimated  cost  is  greater  than  i,  then  we  prune  the  node  and  the  further  expansion 
of  the  node  is  not  necessary.  Otherwise,  we  insert  this  node  along  with  its  estimated  cost  into  a 
queue.  The  nodes  in  the  queue  are  sorted  into  non-decreasing  order  of  their  estimated  costs,  where 
the  first  node  of  the  queue  is  always  the  next  one  to  be  expanded.  When  the  expansion  reaches 
a  leaf  node,  the  actual  cost  of  this  leaf  is  computed.  If  the  cost  is  less  than  I,  we  update  r.  The 
algorithm  terminates  when  the  queue  is  empty. 


4.1.1  The  Estimation  Function 

The  proposed  branch-and-bound  algorithm  is  characterized  by  the  estimation  function.  Let  node  r 
be  at  level  y  of  the  combination  tree  associated  with  subproblem  and  be  represented  by  -<  l,,.-, , 
...,  y,  where  »j  <  *2  <  ...<  ty.  Any  leaf  node  that  can  be  reached  from  node  r  needs 
k  -y  more  forker  instances.  Let  /  =  -<  ji,  32-,  ^  jk-y  >-  be  a  tuple  of  k  —  y  instances  chosen  from 

the  remaining  n  —  iy  instances,  where  j\  <  j2  <  . .  .<  h-y  Let  L  be  the  set  of  all  possible  f 's.  Let 
p(t))  be  the  smallest  cost  among  all  leaf  nodes  that  can  be  reached  hrom  node  v. 


+  .  .  min  (Cj  ) )  -f  e/,.,. 
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SiBce  the  complexity  involved  in  compnting  g{v)  is  we  nse  the  following  estimation  function 

est{v)  to  approximate  p(t)): 


y  »V+*-y  B 

e5t(v)  =  ga.ta  +  min  ^  ^  (c,\ 


0=1 


Since 


E 

J=‘»+i  i»€/ 


c,j,  and 


E 


i>=l 


min 

J/+ 1  ,Xy +2,...,n 


It  IS  easy  to  see  that  esi(v)  <  p(v).  Hence,  we  use  est(r)  as  the  lower  bound  of  the  objective 
function  at  node  v. 


4.1.2  The  Proposed  Algorithm 

Three  parameters  of  the  branch-and-bound  algorithm  are  joiner  instance  (t/,.,),  the  number  of 
processors  that  forher  s  is  allowed  to  run  (*),  and  the  up-todate  minimum  cost  (I).  The  algorithm 
BB{k^q,i)  is  shown  in  Table  1. 

The  MCRP-SP  problem  can  be  solved  by  invoking  BB{k,q,z)  times  with  parameters  set  to 

different  \alues.  BB{k,q,z)  solves  the  problem  Vl,  while  the  whole  procedure,  shown  in  Table  2, 
solves  V. 


4.2  Performance  Evaluation 

The  essence  of  the  branch-and-bound  algorithm  is  the  expansion  of  the  intermediate  nodes.  Upon 
the  remo\al  of  a  node  from  the  queue  its  children  are  generated  and  their  estimated  values  are 
computed.  If  the  estimation  function  performs  well  and  gives  a  tight  lower  bound  of  objective 

function,  the  number  of  expanded  nodes  should  be  smaH.  Then  an  optimal  solution  can  be  found 
out  as  soon  as  possible. 

We  conduct  two  sets  of  exT>eriments  to  evaluate  the  performance  of  the  proposed  solution.  The 
performance  indices  we  consider  are  the  number  of  enqueued  intermediate  nodes  (EIM)  and  the 
number  of  visited  leaf  nodes  (VLF)  during  the  search.  We  calculate  EIM  and  VLF  by  inserting  one 
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counter  for  each  index  at  lines  13  aad  8  of  Table  1  respectively.  Each  time  the  execution  reaches 
line  13  (8),  EIM  (VLF)  is  incremented  by  1. 

The  first  set  of  experiments  is  on  SP  graphs  of  type  T^nd  where  the  communication  cost  between 
any  two  task  instances  is  arbitrary  and  is  generated  by  random  number  generator  within  the  range 
(1,50).  The  execution  cost  for  each  task  instance  is  also  randomly  generated  within  the  same  range. 
The  second  set  of  experiments  is  on  SP  graphs  of  type  with  the  constrain  of  computation¬ 
intensive  applications.  We  vary  the  size  of  the  problem  by  assigning  different  values  to  the  number 
of  processors  in  the  system  (n)  and  the  number  of  parallel-and  subgraphs  connected  by  forker  and 
joiner  (B).  For  each  size  of  the  problem  (n,  B),  we  randomly  generate  50  problem  instances  and 
solve  them.  The  results,  including  the  average  values  of  EIM  and  \aF  over  the  solutions  of  50 
problem  instances,  are  summarized  in  Table  3. 

Fiom  Table  3,  we  find  out  that  the  proposed  method  significantly  reduces  the  number  of  ex¬ 
pansions  for  intermediate  nodes  and  leaf  nodes.  For  example,  for  problem  size  (n,  B)  =  (  20,  40), 
the  total  number  of  leaf  nodes  is  2*°  (=  1,048,576)  if  an  exhaustive  search  is  applied.  However, 
our  algorithm  only  generates  16,857  nodes  on  the  average,  because  we  apply  est{v),  i,  and  the 
branch-and-bound  approach. 

The  branch-and-bound  approach  and  the  estimation  function  even  perform  better  for  the 
computation-intensive  applications.  We  can  see  that  EIM  and  VLF  values  are  much  more  smalier 
in  Set  H  than  those  in  Set  I.  It  is  because  that  in  the  computation-intensive  applications  an  optimal 
number  of  replications  for  the  forker  is  smaller  than  that  in  general  applications.  The  f  value  in 
htnction  OFTQ  is  able  to  refiect  this  fact  and  avoid  the  unnecessaxv  expansions. 


5  Sub- Optimal  Replication  for  SP  GrapLs  of  Type  Tand 


The  branch-and-bound  algorithm  in  section  4.1  yields  an  optimal  solution  for  subgraphs. 
However,  the  complexity  involved  is  in  exponential  time  in  the  worst  case.  Hence,  we  also  consider 
to  find  a  near-optimal  solution  in  polynomial  time. 
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5.1  Approximation  Method 


For  the  problem  Vl  defined  in  section  4.1,  we  exploit  an  approximation  approach  to  solve  it  in 
polynomial  time.  The  approach  is  based  on  iterative  selection  in  a  dynamic  programming  fashion. 
Given  a  joiner  instance  t;,.,  and  subgraphs  6  =  1,  2, and  minimum  costs  between 

tA.,  and  p  =  1,  2,  ...,  n,  and  h  =  1,  2,  ...,  S.  we  define  Svb{p,h)  to  be  the  sub-optimal 
solution  for  replication  of  forker  s  where  forker  instances  ,  •  •  • ,  and  subgraphs  ,  G2, 

. . . ,  Gi  are  taken  into  consideration. 


Strategy  1: 

5u6(p,  b)  can  be  obtained  from  Svh{p  -  1,  b)  by  considering  one  more  forker  instance  Strategy 
1  consists  of  two  steps.  The  first  step  is  to  initialize  5«5(p,  6)  to  be  Svh{p-  1,6)  and  to  determine 
if  is  to  be  included  into  5«6(p,  6)  or  not.  If  yes,  then  add  in.  The  second  step  is  to  examine 
if  any  instances  in  5u6(p  -  1, 6)  should  be  removed  or  not.  Due  to  the  possible  inclusion  of  in 
the  first  step,  we  may  obtain  a  lower  cost  if  we  remove  some  instances  l,,,-’s,  t  <  p,  and  reassign  the 
communications  for  some  graphs  Gj'z  from  1^/%  to  p. 

Strategy  2: 

Svh{p,b)  can  also  be  obtained  from  5u6(p,6-  1)  by  taking  one  more  subgraph  Gj  into  account. 

Initially,  5«6(p,  6)  is  set  to  be  S«6(p,  6  - 1).  The  first  step  is  to  choose  the  best  forker  instance  from 

Ku  U.7,  •  - . ,  U,p  for  Gfc.  Let  the  best  instance  be  .  The  second  step  is  to  see  if  is  in  Sub{p,  b) 

or  not.  If  not,  a  condition  is  checked  to  decide  whether  should  be  added  in  or  not.  Dpon  the 

addition  of  we  may  remove  some  instances  and  reassign  the  communications  to  achieve  a  lower 
cost. 


We  compare  two  possible  results  obtained  from  the  above  two  strategies  and  assign  the  one  with 
lower  cost  to  actual  5u6(p,6).  Hence  by  computing  in  a  djmamic  programming  fashion,  Sub{n,B) 
can  be  obtained.  The  algorithm  and  its  graphical  interpretation  are  shown  in  Figure  9. 


5.2  Performance  Evaluation 

Tne  complexitj'  involved  in  each  strategy  described  in  section  5.1  is  0(nB).  Since  the  solving 
of  5u6(n,H)  needs  to  invoke  nx  B  times  of  strategies  1  and  2,  the  total  complexitj’  of  solving 
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Sub{n,B)  by  the  approximation  method  is  0{n?B^). 

We  conduct  a  set  of  experiments  to  evaluate  the  performance  of  the  approximation  method.  For 
each  problem  size  (n.,  B),  we  randomly  generate  50  instances  and  solve  them  by  using  appro>dmation 
method  and  exhaustive  searching.  The  data  for  computation  and  communication  in  the  experiments 
axe  based  on  the  uniform  distribution  over  the  range  [1,50].  We  compare  the  minimum  cost  obtained 
from  exhaustive  searching  (EXHAUST)  with  those  from  from  approximation  (APPROX)  and  single 
assignment  solution  (SINGLE).  The  optimal  single  assignment  solution  is  the  one  in  which  only  one 
forker  instance  is  allowed.  Note  that  the  solutions  from  SINGLE  are  obtained  from  the  shortest 
path  algorithm  [1].  The  results  are  summarized  in  Table  4.  TVom  the  table,  we  find  out  that  the 
approximation  method  yields  a  tight  approximation  of  the  miniTmiTn  cost.  On  the  contrary,  the 
error  range  for  single  copy  solution  is  at  least  20%.  This  justifies  that  the  replication  ran 
lead  to  a  lower  cost  than  an  optimal  assignment  does. 

6  Solution  of  MCRP-SP  for  computation- intensive  applications 

6.1  The  Solution 

Given  a  computation-intensive  application  with  its  SP  graph,  we  generate  its  parsing  tree  and 
assignment  graph  first.  The  algorithm  finds  the  minimum  w^ght  replication  graph  from  the  as¬ 
signment  graph.  Then  the  optimal  solution  is  obtained  from  the  miTiimTiTn  weight  replication  graph. 

The  algorithm  traverses  the  parsing  tree  in  the  postfix  order.  Namely,  during  the  traversal,  an 
optimal  solution  of  the  subtree  5-,  induced  by  an  intermediate  node  x  along  with  all  x's  descendant 
nodes,  can  be  found  only  after  the  optimal  solutions  of  x's  descendant  nodes  are  found.  Given  an 
SP  graph  G  and  a  distributed  system  S,  we  know  that  there  is  a  one-to-one  correspondence  between 
each  subtree  5-  in  a  parsing  tree  T{G)  and  a  limb  in  the  assignment  graph  of  G  on  S.  Whenever  a 
child  node  5  of  r  is  visited,  the  corresponding  limb  in  the  assignment  graph  will  be  replaced  •with  a 
a  two-layer  Tehain  limb  if  h  is  a  Tehain-  or  T^r-'type  node;  and  a  one-layer  T„nit  limb  if  5  is  a  Tonrf-type 
node.  The  algorithm  is  shown  in  Table  5.  A  graphical  demonstration  of  how  the  algorithm  solves 
the  problem  is  shown  in  Figure  10. 

Before  the  replacement  of  a  limb  is  performed  (i.e.  s  is  a  TeABm-type  node),  each  con¬ 

stituent  child  limb  has  been  replaced  with  a  Tunit  or  two-layer  Tehain  limb.  Hence,  the  shortest 
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path  algorjthm  [1]  caa  be  used  to  compute  the  weights  of  the  new  edges  between  ea^  node  in  the 
source  layer  and  each  node  in  the  sink  layer  of  the  new  TcUin  limb.  The  complexity,  from  lines  05 
to  08  of  Table  5,  in  transformation  of  the  limb,  corresponding  to  an  intermediate  node  i  with  M 
children,  into  a  twc^layer  limb  is  O^Mn^).  An  example  of  iUustrating  the  replacement  of  a 

TcHain  limb  is  shown  from  parts  (b)  to  (c)  and  parts  (d)  to  (e)  in  Figure  10. 

For  the  replacement  of  a  limb,  we  have  to  compute  Cj,,’s.  The  values  can  also  be  computed 
by  the  shortest  path  algorithm.  Hence,  the  complexity  involved  in  lines  16  and  17  is  0(^71^). 
According  to  the  computational  model  in  section  2.2,  each  task  instance  s  may  start  its  execution 
If  it  receives  the  necessary  data  from  any  task  instance  of  its  predecessor  d.  And.  from  Lemma 
2,  we  know  that  the  minimum  sum  of  initialization  costs  of  multiple  task  instances  of  s  will  be 
always  from  only  one  task  instance  of  d.  Therefore,  the  initialization  of  task  instance  1,.,  depends 
on  which  task  instance  of  d  it  communicates  with.  That  is  why  4n  line  19,  the  communication 
cost  is  added  to  the  the  execution  cost  of  e,,  before  OPTQ  is  invoked.  And  the  most 

significant  part  of  the  replacement  is  to  compute  the  wdghts  on  the  new  edges  from  the  source 
layer  to  sink  layer.  The  complexity  is  x  OiOPTQ),  which  in  the  worst  case  is  n^2^.  However,  in 
the  average,  our  OFT  function  performs  pretty  well  and  reduces  the  complexity  significantly,  ’^n 
example  of  illustrating  the  replacement  of  a  limb  is  shown  from  parts  (c)  to  (d)  in  Figure  10. 

\V€  also  consider  to  use  the  approximation  method  to  find  the  sub-optimal  replacement  of  a 
limb.  In  that  case,  function  OPTO  a  li^e  21  is  replaced  with  5ti5(n,  B).  The  total  complexity 
involved  is  0{ii*B^)  then. 

Finally,  for  the  replacement  of  a  Tg^  limb,  if  there  are  B  subgraphs  connected  between  the  forker 
and  the  joiner,  then  the  complexity  will  be  C>(Hn»)  for  the  new  edges  and  OiBn^)  for  ,’s.  An 
example  of  illustrating  the  replacement  of  a  limb  is  shown  from  parts  (a)  to  (b)  in  Fi^e  10. 

When  the  traversal  reaches  the  root  node  of  the  parsing  tree,  the  result  of  FINDQ  will  give 
us  either  one  single  layer  or  two  layers,  depending  on  the  type  of  root  node.  All  we  have  to  do  is 
to  select  the  lightest  of  these  n  (in  single  layer)  or  (in  two  layers)  shortest  path  combinations. 
An  optimal  repBcatjon  graph  itself  is  found  by  combining  the  shortest  paths  between  the  selected 
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nodes  tiat  were  saved  earlier.  The  whole  algorithm  has  the  complexity  of 

0iAn^2”)  + 

X  I 

where  A  is  the  number  of  Tend  limbs,  R;  is  the  number  of  subgraphs  in  the.ith  T^r  limb,  and  C,-  is 
the  number  of  layers  in  the  ith  TcUin  limb.  This  is  not  greater  than  0(Mn^2”),  where  M  is  the 
total  number  of  tasks  in  the  SP  graph.  The  complexity  of  the  algorithm  is  a  linear  function  of  M 
if  the  number  of  processors,  n,  is  fixed. 

6.2  Conclusion  Remark 

This  paper  has  focused  on  MCRP-SP,  the  optimal  replication  problem  of  SP  task  graphs  for 
computation*intensive  applications.  The  purpose  of  replication  is  to  reduce  inter-processor  commu¬ 
nication,  and  to  fully  utilize  the  processor  power  in  the  distributed  systems.  The  SP  graph  model, 
which  is  extensively  used  in  modeling  applications  in  distributed  systems,  is  used.  The  applications 
considered  in  this  paper  axe  computation-intensive  in  which  the  execution  cost  of  a  task  is  greater 
than  its  communication  cost.  We  prove  that  MCRP-SP  is  NP-complete.  We  present  branch-and- 
bound  and  approximation  methods  for  SP  graphs  of  type  Tand.  The  numerical  results  show  that 
the  algorithm  performs  very  well  and  avoids  a  lot  of  unnecessary’  searching.  Finally,  we  present  an 
algorithm  to  solve  the  MCRP-SP  problem  for  computation-intensive  applications.  The  proposed 
optimal  solution  has  the  complexity  of  0(n^2"A/)  in  the  worst  case,  while  the  approximation  solu¬ 
tion  is  in  the  complexity  of  where  n  is  the  number  of  processors  in  the  system  and  M  is 

the  number  of  tasks  in  the  graph. 

For  the  applications  in  which  the  communication  cost  between  two  tasks  is  greater  than  the 
execution  cost  of  a  task,  the  replication  can  still  be  used  to  reduce  the  total  cost.  However,  in  the 
extreme  case  w’here  the  execution  cost  of  each  task  is  zero,  the  optimal  allocation  will  be  to  assign 
each  task  to  one  processor.  We  are  studying  the  optimal  replication  for  the  general  case. 
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Figure  1:  An  SP  graph  and  its  parsing  tree 
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Figure  2;  An  example  to  show  how  the  replication  can  reduce  the  total  cost 
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Figure  7:  An  illustration  about  how  to  transform  a  UCTR  instance  to  a  Teni  SP  graph 
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Table  1:  Function  5,2):  brandb-and-boiind  algoritlun  for  solving  problem 

01  Initialize  the  queue  to  be  empt}'; 

02  Insert  root  node  wq  into  the  queue; 

03  While  the  queue  is  not  empty  do  begin 
04  Remove  the  first  node  u  from  the  queue; 

05  Generate  all  child  nodes  of  u  ; 

06  For  each  generated  child  node  «  do  begin 

07  If  t)  is  a  leaf  node  (i.e.  v  is  at  level  k)  then 

08  Compute  g{y)  by  setting  i  to  be  ^  ; 

09  Set  i  =  min  (  i,  p(r)); 

10  else  begin  /*  u  is  an  intermediate  node  *  j 

11  Compute  est(r)  by  (5)  ; 

12  If  esi{y)  <  z  then 

13  Insert  v  into  the  queue  according  to  e5t(w)  ; 

14  end; 

15  end; 

16  end; 

17  Retum(£). 


Table  2:  Function  OPT{ClJs,  e^,p's):  the  optimal  solution  of  MCRP-SP  of  t}’pe  when 
and  Ca,?  ’s  are  given 

01  Sort  ta,p’s  into  a  non-decreasing  order  by  values  of  e^^^’s  ; 

02  For  c  =  1  to  n  do  begin 

03  Let  node  u  be  a  leaf  node  at  level  1; 

04  Set  V  to  be  1^,1  and  ib  to  be  1; 

05  Compute  p(u)  by  setting  i  to  be  ^  ; 

06  Initialize  i  to  be  p(v)  ; 

07  For  i  =  1  to  n  do 
08  i=FB(i,g,2)  ; 

09  Set  c(g)  s=  r  ; 

10  end; 

11  Output  the  combination  with  the  minimum  value  among  c(l),  c(2), . . . ,  c(n). 
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Table  3:  Computation  Results  for  branch-and-bousd  approach 


n 

B 

Set  I 

EIM^  VLF^ 

Set  n 

EIM^  VLfJ 

Total  Number  of 
leaves  (2") 

4 

20 

2 

6 

4 

7 

16 

24 

3 

6 

3 

6 

16 

28 

4 

7 

3 

6 

16 

32 

4 

7 

3 

6 

16 

36 

4 

7 

4 

7 

16 

40 

3 

6 

3 

6 

16 

8 

20 

36 

74 

16 

51 

256 

24 

40 

75 

21 

62 

256 

28 

50 

86 

26 

68 

256 

32 

63 

94 

37 

78 

256 

36 

73 

96 

47 

84 

256 

40 

81 

97 

50 

86 

256 

20 

186 

558 

81 

340 

4,096 

24 

231 

639 

102 

398 

4,096 

28 

839 

167 

543 

4,096 

32 

451 

wmm 

204 

617 

1  4,096 

36 

454 

984 

269 

720 

1  4,096 

40 

636 

1,186 

301 

780 

4,096 

16 

rw 

to 

o 

CO 

UUJI 

111111111^^ 

1,065  1  4,161 

329 

1,711 

65,536 

28 

1,335  1  4,862 

546 

2,496 

65,536 

32 

726 

3,127 

36 

2,322  1  7,227 

839 

Kmil 

40 

2,880 

8,511 

1,179 

4,510 

60)S36 

20 

20 

389 

3,079 

1,048,576 

[W 

761 

5,280 

1,048,576 

28 

5,551 

27,018 

1,227 

7,905 

1,048,576 

32 

6,405 

30,521 

1,709 

10,357 

1,048,576 

"sT 

9,517 

40,767 

1,048,576 

40 

11,651 

48,087 

3,086 

16,857 

1,048,576 

^ :  Each  value  shown  is  the  average  value  over  50  runs. 
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Table  4:  Simulation  Results  for  Approjdmation  Method 


B 

APPROX^ 

EXHAUST^  single  error  % 

approx  error  % 

1 

20 

2876 

2407 

2400 

20 

0.28 

24 

3463 

2835 

2831 

22 

0.16 

28 

4032 

3264 

3259 

24 

0.18 

32 

4606 

3678 

3673 

25 

36 

5198 

4084 

4082 

27 

0.05 

40 

5790 

4514 

4514 

28 

0.00 

8 

20 

2794 

2282 

2250 

24 

1.46 

24 

3356 

2672 

2636 

27 

1.38 

28 

3931 

3060 

3028 

30 

1.05 

32 

3443 

3413 

33 

36 

5127 

3831 

3800 

35 

0.80 

40 

5683 

4215 

4192 

36 

0.55 

12 

20 

2767 

2213 

2161 

28 

2.42 

24 

3359 

2592 

2542 

\  32 

1.99 

28 

3912 

2996 

2941 

33 

.  1.88 

32 

4491 

3364 

3299 

36 

36 

3736 

3676 

38 

1.62 

40 

5610 

4101 

4043 

1  39 

1.43 

16 

20 

2733 

1  2167 

2111 

1  29 

2.66 

24 

3287 

2558 

r  2492 

]  3ri 

2.66 

28 

3844 

2932 

2865 

34  1  2.31 

32 

4393 

3315 

3240 

36 

2.32 

36 

3659 

3584 

r  39 

2.10 

40 

4045 

3970 

40 

1.89 

£ach  value  shown  is  the  average  value  over  50  runs. 


single  erTor% 


SINGLE -EXHAUST 
EXHAUST 


X  100%. 


approx  error% 


APPROX  -  EXHAUST 
EXHAUST 


X  100%. 
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Table  5:  Algorithm  FIND{Ss):  the  algoiithm  for  finding  the  shortest  path  combinations  from  the 
hmb  which  corresponds  to  the  subtree  S*  induced  by  an  intermediate  node  x  and  all  I’s  descendant 
nodes  in  a  parsing  tree 


01 

02 

03 

04 

05 

06 

07 

08 

09 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 


Case  of  the  type  of  intermediate  node  x: 

Tl'pe  Tchain  • 

For  b  =  the  first  child  node  of  x  to  the  last  one  do 

FJND(^Sk)\  f*  Now  the  limb  corresponding  to  St  is  replaced  */ 

Replace  the  limb  corresponding  to  5*  with  a  two-layer  TeUin  limb  where 
the  source  (sink)  layer  of  the  old  limb  is  the  source  (sink)  layer  of  new  2-layer  limb; 
Put  weights  on  the  edges  between  source  and  sink  layers  equal  to  the  shortest  path 
between  the  corresponding  nodes; 

\ 

Tj'pe  Tand  ■  I*  Let  x  =  [  Tand,  forker  s,  joiner  h]*  j 

Let  d  be  the  predecessor  of  forker  s  in  (?  (i.e.  <  d,s>  €  V); 

Let  B  be  the  number  of  child  nodes  of  i  in  the  parsing  tree; 

/*  Le.  there  are  B  subgraphs  connected  by  s  and  k  * / 

For  b  —  the  first  child  node  of  x  to  the  B-th  child  of  x  do 

FJA'D(Si);  /*  Now  the  limb  corresponding  to  St  is  replaced  */ 

For  p  =  1  to  71,  5  =  1  to  n  and  b  =  1  to  F  do 

Compute  the  minimum  replication  cost  from  to  1^,?  w.r.t.  child  b  ; 
For  i  =  1  to  71  do  begin 

For  p  =  1  to  71  do  E,,^  =  fidAiyp)  +  ; 

/*  accounts  for  initialization  by  Uj  and  execution  cost  itself.  */ 

For  g  =  1  to  71  do  =  OPTiC^Js,Ejs)  ; 

/•  Create  new  edges  from  t^,,  ’s  to  l^^'s  •/ 

end^ 

Replace  the  Tand  Rmb  with  a  Tmit  limb,  where  source  layer  =  sink  layer  =  layer  h, 
and  there  are  new  edges  from  layer  d  to  layer  h; 

Type  Tor  :  /*  Let  x  =  (  Tory  forker  s,  joiner  ^ 

Use  the  same  method  described  above  from  lines  12  to  17  to  compute  ’s  ; 
Replace  the  T^r  limb  with  a  two-layer  Tchnin  limb,  where 
the  source  (sink)  layer  of  limb  is  the  source  (sink)  layer  of  Tchain  limb  and 
P»,h{p,q)  =  minj,(Cy,^),  Vp  and  ?  ; 


32  end  case; 

33  Save  the  shortest  paths  between  any  node  in  source  layer  and  any  node 
in  sink  layer  for  future  reference. 
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in  which  the  execution  cost  of  a  task  is  greater  than  its  communication  cost.  The 
complexity  of  MCRP-SP  for  such  applications  is  proved  to  be  NP-complete.  We  present 
a  branch-and-bound  method  to  find  an  optimal  solution  as  well  as  an  approximation 
approach  for  suboptimal  solution.  The  numerical  results  show  that  such  replication 
may  lead  to  a  lower  cost  than  the  optimal  assignment  problem  (in  which  each  task  is 
assigned  to  only  one  processor)  does.  The  proposed  optimal  solution  has  the  complexity 
of  OCn*^!^),  while  the  approximation  solution  has  0(n^  M2)  ,  where  n  is  the  number  of 
processors  in  the  system  and  M  is  the  number  of  tasks  in  the  graph. 
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Abstract 

Traditional  control  systems  have  been  designed  to  exercise  control  at  regularly  spaced  time 
instants.  When  a  discrete  version  of  the  system  dynamics  is  used,  a  constant  sampling  interval  is 
assumed  and  a  new  control  value  is  calculated  and  exercised  at  each  time  instant.  In  this  paper 
we  formulate  a  new  control  scheme,  itmporal  control,  in  which  we  not  only  calculate  the  control 
value  but  also  decide  the  time  instants  when  the  new  values  are  to  be  used.  Taking  a  discrete, 
linear,  time~  in  variant  system,  and  a  cost  function  which  reflects  a  cost  for  computation  of  the 
control  values,  as  an  example,  we  show  the  feasibility  of  using  this  scheme.  We  formulate  the 
temporal  control  scheme  as  a  feedback  scheme  and,  through  a  numerical  example,  demonstrate 
the  significant  reduction  in  cost  through  the  use  of  temporal  control. 
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1  Introduction 


Control  systems  have  been  used  for  the  control  of  dynamic  systems  by  generating  and  exercising 
control  signals.  Traditional  approach  for  feedback  controls  has  been  to  define  the  control  signals, 
u{t),  as  a  function  of  the  current  state  of  the  system,  x{t).  As  the  state  of  the  system  changes 
continuously  the  controls  change  continuously,  i.e.  they  are  defined  as  functions  of  time,  t,  such 
that  time  is.  treated  as  a  continuous  variable.  When  computers  are  used  for  implementing  the 
control  systems,  due  to  the  discrete  nature  of  computations,  time  is  treated  as  a  discrete  variable 
obtcdned  by  regularly  spaced  sampling  of  the  time  axis  at  A  seconds.  Many  standard  control 
formulations  are  defined  for  the  discrete  version  of  the  system,  with  system  dynamics  expressed  at 
discrete  time  instants.  In  these  formulations  the  system  dynamics  and  the  control  are  expressed  as 
sequences,  x(k')  and  u(k). 

Most  of  the  traditional  control  systems  were  designed  for  dedicated  controllers  which  had  only 
one  function,  to  accept  the  state  values,  x(k)  and  generate  the  control,  u(k).  However,  when  a 
general  purpose  computer  is  used  as  a  controller,  it  has  the  capabilities,  and  may,  therefore,  be 
used  for  other  functions.  Thus,  it  may  be  desirable  to  take  into  account  the  cost  of  computations 
and  consider  control  laws  which  do  not  compute  the  new  value  of  the  control  at  every  instant. 
When  no  control  is  to  be  exercised,  the  computer  may  be  used  for  other  functions.  In  this  paper 
we  formulate  such  a  control  law  and  show  how  it  can  be  used  for  control  of  systems,  achieving  the 
same  degree  of  control  as  traditional  control  systems  while  reducing  computation  costs  by  changing 
the  control  at  a  few,  specific  time  instants.  We  term  this  temporal  control. 

To  the  best  of  our  knowledge  this  approach  to  the  design  and  implementation  of  controls  has  not 
been  studied  in  the  past.  However,  taking  computation  time  delay  into  consideration  for  real-time 
computer  control  has  been  studied  in  several  reseairch  papers  [1,  5,  6,  9,  11,  13].  But,  all  of  these 
papers  concentrated  on  examining  computation  time  delay  effects  and  compensating  them  while 
maintaining  the  assumption  of  exercising  controls  at  regularly  spaced  time  instants. 

The  basic  idea  of  temporal  control  is  to  determine  not  only  the  values  for  u  but  also  the  time 
instants  at  which  the  values  are  to  be  calculated  and  changed.  The  control  values  are  assumed 
to  remain  constant  between  changes.  By  exercising  control  over  the  time  instants  of  changes  the 
designer  has  an  additional  degree  of  freedom  for  optimization.  In  this  paper  we  present  the  idea  and 
demonstrate  its  feasibility  through  an  example  using  a  discrete,  linear,  and  time  invariant  system. 
Clearly,  the  same  idea  can  be  extended  to  continuous  time  as  well  as  non-linear  system. 

The  paper  is  organized  as  follows.  In  Section  2,  we  formulate  the  temporal  control  problem  and 
introduce  computation  cost  into  performance  index  function.  The  solution  approach  for  temporal 
control  scheme  is  discussed  in  Section  3.  In  Section  4,  implementation  issues  are  addressed.  We 
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provide  an  example  of  controlling  rigid  body  satellite  in  Section  5  .  In  this  example,  an  optimal 
temporal  controller  is  designed.  Results  show  that  the  temporal  control  approach  performs  better 
than  the  traditional  sampled  data  control  approach  with  the  same  number  of  control  exercises. 
Section  6  deals  with  the  application  of  temporal  controls  to  the  design  of  real-time  control  systems. 
Finally,  Section  7,  we  present  our  conclusions. 


2  Problem  Formulation 

In  temporal  control,  the  number  of  control  changes  and  their  exercising  time  instants  within  the 
controlling  interval  [0,  Tj]  is  decided  to  minimize  a  cost  function.  To  formulate  the  temporal  control 
problem  for  a  discrete,  linear  time-invariant  system,  we  first  discretize  the  time  interval  [0,Ty]  into 
M  subintervals  of  length  A  =  Tj/M.  Let  Dm  =  {0,  A,2A, . . .,  (Af  -  1)A}  which  denote  M  time 
instants  which  are  regularly  spaced.  Here,  control  exercising  time  instants  are  restricted  within 
Dm  for  the  purpose  of  simplicity.  The  linear  time-invariant  controlled  process  is  described  by  the 
difference  equation: 


xik  +  1)  =  Ax{k)  +  Bu{k)  (1) 

y(A:)  =  Cx(k) 

where  k  is  the  time  index.  One  unit  of  time  represents  the  subinterval  A,  whereas  x  €  Tv”  and 
u  are  the  state  and  input  vectors  respectively. 

It  is  well  known  that  there  exists  an  optimal  control  law  [4] 

=  f[x{i)]  i  =  0, 1, M-1  (2) 

that  minimizes  the  quadratic  performance  index  function  (Cost) 

•^M  =  -1-  x^{M)Ox{M)  (3) 

where  Q  €  is  positive  semi-definite  and  R  £  is  positive  definite. 

As  we  can  see,  traditional  controDer  exercises  control  at  every  time  instant  in  Dm-  However, 
in  temporal  control,  we  are  no  longer  constrained  to  exercise  control  at  every  time  instant  in  Dm- 
Therefore,  we  want  to  find  an  optimal  control  law,  <5  and  p  for  t  =  0, 1: 

u°{i)  =  -  1)  if  6{i)  =  0  (4) 

=  5[a:(f)]  if  S(i)  =  1 
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(5) 


that  miniini2es  a  new  performance  index  function 

•  ,  -W-i  M-l 

=  Y^\^'^{^)Q^{k)-^‘u^{k)Ru{k)]'Jrx^{M)Qx{M)-¥  Yi 

*=0  i=0 

=  Jm  +  Cm 

Mere,  /t  »s  the  comptttation  cost  of  getting  a  new  control  value  at  a  time  instajit,  and  Cm  = 
E{t;’  (5(>:)p' tlenotes  the  total  Computatioo  cost.  Kote  that  v  =  Ylk=o  ^(^0  the  number  of 
control  changes.  Also,  let  =  {t0)ii>^25  •  •  -5^./-:}  consist  of  control  changing  time  instants  where 
io  =  0,  li  =  njA,  . . =  n,,_jA.  That  is,  no,ni,n2,...,n„_i  are  the  indices  for  control 
changing  time  instants  and  5(n,)  =  1  for  i  =  0, 1, 2, . .  .i/  —  1. 

With  this  new  setting  we  need  to  choose  i/,  and  control  input  values  to  find  an  optimal 
controller  which  minimizes  Jj^^.  This  new  cost  function  is  different  from  Jm  in  two  aspects.  First, 
the  concept  of  computational  cost  is  introduced  in  as  Cm  term  to  regulate  the  number  of  control 
changes  chosen.  If  we  do  not  take  this  computation  cost  into  consideration  v  is  likely  to  become 
M.  If  computation  cost  is  high  (i.e.,  /i  has  a  large  value)  then  v  is  likely  to  be  small  in  order  to 
minimize  the  total  cost  function.  Second,  in  temporal  control,  not  only  do  we  seek  optimal  control 
law  u(i(t)),  but  also  the  control  exercising  time  instants  and  the  number  of  control  changes.  In  the 
next  section,  we  present  in  detail  specific  techniques  for  finding  an  optimal  temporal  control  law. 

3  Temporal  Control 

We  develop  a  three-step  procedure  for  finding  an  optimal  temporal  controller. 

Step  1.  Find  an  optimal  control  law  given  v  and  Du 
Step  2.  Find  best  Du  given  v 
Step  3.  Find  best  u 

First,  in  the  following  two  subsections(3.1  and  3.2)  we  derive  a  temporal  control  law  w'hich 
minimizes  the  cost  function  w-hen  Du  is  given,  i.e.,  both  time  instants  and  number  of  controls 
are  fixed.  Since  v  and  Du  aje  fixed  we  can  use  Jm  defined  in  (  5)  as  a  cost  function  instead  of 
Secondly,  assume  that  v  is  fixed  but  Du  can  vary.  Then  we  present  an  algorithm  in  section 
3.3  to  find  a  i?®  such  that  Jm  (and  J^)  is  minimized.  Finally,  we  will  vary  v  from  1  to  t'moi 
to  search  an  optimal  D®  at  which  temporal  control  should  be  exerdsed.  Section  3.4  presents  this 
iteration  procedure.  Section  3.5  explains  how  to  incorporate  terminal  state  constraints  into  the 
above  procedure  of  getting  an  optimal  temporal  control  law.  And  a  complete  algorithm  of  the 
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above  procedure  ie  described  in  Section  3.6.  Finally,  in  Section  3.7  we  explain  how  to  get  optimal 
temporal  controllers  over  an  initial  state  space. 

3.1  Closed-loop  Temporal  Control  with  Given 

Assume  that  and  are  given.  Then  a  new  control  input  calculated  at  U  will  be  appUed  to  the 

actuator  for  the  next  time  interval  from  U  to  Our  objective  here  is  to  determine  the  optimal 
control  law 


v°{ni)  =  p[a:(n,)]  i  =  0, 1, v-l 

that  minimizes  the  quadratic  performance  index  function  (Cost)  which  is  defined  in  (  5). 
State  Cost 


Control  Input  Cost 


Figure  1:  Decomposition  of  into 

„  The  principle  of  optimality,  developed  by  Richard  Bellman[2,  3)  is  the  approach  used  here.  That 
IS,  n  a  closed  loop  control  u°{ni)  =  p[x(n.-)]  is  optimal  over  the  interval  to  <  t  <  then  it  is  also 
optimal  over  any  sub-interval  <  t  <  U,  where  0  <  m  <  v.  As  it  can  be  seen  from  Figure  1,  the 
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lotaJ  cost  Jm  can  be  decomposed  into  i^-s  for  0  <  t  <  1/  where 


Fi  =  i^(n,)(5i(n,)  +  +  l)(5i(n,- +  1)  (7) 

+  x^(n,-  +  2)Qx(TLi  +  2)  +  ...  +  x^(n,+i  -  l)Qi(n,+i  -  1) 

+  (n.+i  -  ni)u^{n;)Ru{ni) 

That  is-,  from  (  1), 

Fi  =  x^(n,-)(3i(n,-)  +  (Ai(ni)  +  £u(n,-))^(3(Ai(n,-)  +  J9u(n.-))  (8) 

+  (A^i(n,-)  +  ABu{n{)  +  5u(n,))^Q(A^i(n,)  +  ABu{n{)  -f  5u(n,-)) 

+  ...  +  (>!’'*"'■’ +  A^"*''~^'~^Bu{ni)  +  ...  +  ABu{ni)  +  Bu{ni))^Q 

^  Bu{ni)  +  ...  +  ABu{ni)  +  Bu{ni)) 

+  -  n,-)u^(n,-)iiu(n,-) 

This  can  be  rewritten  as 

n,-43~n,-l 

Fi  =  x^{ni)Qx{ni)+  ^  [Ajx{ni)ABju{ni)fQ[Ajx{ni)  +  Bju{ni)]  (9) 

j=i 

+  (n,-+i  —  ni)u^  {ni)Ru{Tii) 

where  Aj  =  A^  and  Bj  = 

Then  Jm  can  be  expressed  as 


Jm  =  FoAF,AF,A...FF,.  (10) 

Let  Sm  be  the  cost  from  i  =  u  —  m  ~  1  to  i  =  u: 

Sm  —  Ri^—m+1  +  -PV-Tn+2  +  "f  Ri>-1  "f  Fi,.  1  <  m  <  1/  +  1.  (11) 

These  cost  terms  are  well  illustrated  in  the  above  Figure  1. 

Therefore,  by  applying  the  principle  of  optimality,  we  can  first  minimize  Sj  =  J),,  then  choose 
fV-i  to  minimize  S2  =  iv-i  -r  iv  =  -S’®  -f  Fu-\  where  S®  is  the  optimal  cost  occurred  at  i^,.  We 
can  continue  choosing  Flz-j  to  minimize  Sz  =  Pi,-2  -r  FV-i  A  F„  =  F^^2  +  and  so  on  until 
Su-ri  =  Jm  is  mininoized.  Note  that  S-^  =  F\,  =  i^(n„)Qi(n„)  is  determined  only  from  x(n^)  which 
is  independent  of  any  other  control  inputs. 
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3.2  Inductive  Construction  of  an  Optimal  Control  Law  with  Given 

We  inductively  derive  an  optimal  controller  which  changes  its  control  at  v  time  instants 

As  we  showed  in  the  previous  section,  the  inductive  procedure  goes  backwards  in  time 

from  to  SU,.  Since  S,  =  =  2:^(n,)Qx(n.)  +  v'^{n,)Ru{n,)  and  x{n,)  is  independent  of 

.(n.),  we  can  let  .‘>(n.)  =  u^(M)  =  0  and  where  Q  is  symmetric  and  positive 

semi -definite. 

Induction  Basis;  S{  =  x'^{n,)Qx{n,)  where  Q  is  symmetric. 

Inductive  Assumption:  Suppose  that 

~  )P{v  -  m  +  ) 

holds  for  some  m  where  1  <  m  <  i/  and  P{v  -  m  +  1)  is  symmetric. 


We  can  write  5^  as 


From  the  definition  of  5^  and  (9), 

*^”>+1  —  ‘S'nj  T  Fu—m 

=  -TO  d-  {nu-m)Qx{n^_^) 

Tn+3  “-Tii,«rn  “I 

~  ?  -r  Q\Ajx{n^_^)  -f  Bju{n^_^)] 


t|/-TO+l  {Tlu—rn)P'U-{p-i,—-m) 


And  the  above  equation  becomes 


’*»/— m43  — >1 

~  ^  [^i2(7t»/-TO )  T  )]^ClAyx(7i^_„,)  4  Bju(n^_„,)] 
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If  we  differentiate  Sm-^-i  ^^’ith  respect  to  u(n^.^),  then 

•  55m-f] 


d"  (^ni,_m-n ^  d“  J  ~n^-rn  )  ^(^t/-m) 

"1“  +  —  771+  l)5n»,_^43~nu_m^(^t'-yn) 

m+3  m*~l 

+  [2BjQAjx{n^.rn)-r2BjQBju{n^^rn)] 

j=i 

+  2(n.  — m+l  )iit£(7l(^  — TH  ) 

^{"®n,^^n»43  ~nu-m  “  771  +  1 )  An^_y„^3  ~n^_m 

m43  — Tlu  —  rn”"! 

+  Y  BjQAj}x{n^.rn) 


'^»/— m4  3 

+  ^  ^  Q^j  "T  (77,,„T;ri4.l  7l|^«y7^)jj}'u(77^_^) 

i=i 

Note  that  P{u  -  ttz  +  1)  is  symmetric  and  the  foDowing  three  rules  are  applied  to  differentiate  5*m+i 


aoove. 


=  2Qx 

■^{x^Qy)  =  Qy 


X  Qy)  =  (2'^2: 


Let  =  0;  from  Lemma  1  and  Lemma  2  given  later  we  can  obtain  ‘u°(n^^,j,)  which 

minimizes  Sm+i  and  thus  obtain  . 

U°K-m)  =  -• _ (17) 

Ti*,_m41  — Til,— .fTi  "“1 

T  ^  ^  Q-^j  T  Tn-{-l  ““  )ii} 

j=l 

‘Tn+J  “*Tli,— .fri  "“"I 

i=i 

=  -  777)a:(7i;,^^) 


where  K(i^  —  m)  is  denned  in  (  17). 


46 


Therefore,  we  can  write 


=  (jg) 

[•^n^,_„+j-n^_„  -  ^n.^_„^.j_n^_^iv  (2/  -  m)]l(n^_^) 

If  we  use  (17)  and  (  18),  we  have 

^m+l  =  {[^nu-m4l-n.-,n  -  m)]l(n;,_„)}^P(l/-  m+  1)  (19) 

m  +  :  — n,^«Trj“l 

+  E  {Mi  -  B,K{u  -  Tn)]x{n^.^)}^Q{[Aj  -  B^K{v  -  m)]i(n^_^)} 

+  -  nt,^rn)\K{u  —  m)z(n^_TO (j/  -  7n)z(n^_^)] 

This  equation  cam  be  rewritten  as 

~  Tn)f  P{u  ~  m  +  l)  (20) 

»n+l  tn  ip  “  ^)] 

+  Q 

m-f  J  Tn““l 

i=i 

where  P{u  -  m)  is  obtained  from  K{u  -  m)  and  P(i.  -  m  +  1)  as  in  (  20).  .41so  note  that  knowing 
P{v  -  m  -f  1)  IS  enough  to  compute  K{u  -  m)  because  other  terms  of  (  17)  are  known  a  priori. 

Therefore,  we  find  a  symmetric  matrix  P{v -  m)  satisfying  5^+,  =  z^(7t^_„ )P{v - 7n)x{nu.^). 
From  (  17)  and  (  20),  we  have  the  following  recursive  equations  for  obtaining  P{i/  -  m)  from 
P{v  -  rn  +  1)  where  m  =  1, 2, ...,  v. 


K{v-m)  =  {bI  P{v  -  m+l^B 

m  +  3  “"Tiiz—rr.  —  1 

‘  ^5  QBj  -T  (TZ^^m-fl  “  '^t/-Tn)R]  ^ 


Tn4]  — >n 


^iQAj) 
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(22) 


P{U  -  m)  =  -  Tn)f  P{u  -  m  +  l) 

-n^-m 

+  Q 

r  \A^-BjK{u-m)fQ[A^-B^K{u-m)] 

j=i 

+  (n^-m+a  -  -  m)RK{v  -  m) 

Also,  we  know  that  at  each  time  instant  n^_n,A 

v°{nu-m)  =  -  Tn)x{n^.„,)  (23) 

Hence,  with  P{i>)  =  Q,  we  can  obtain  K{i)  and  P{i)  ioj  i  =■  v  -  l,i/  -  2,  ...,0  recursively  using 
(  21)  and  (  22).  At  each  time  instant  n,-A,  i  =  0, 1,2,  -  1  the  new  control  input  value  will  be 

obtained  using  (  23)  by  multiplying  K{i)  by  i(n,)  where  2(n,-)  is  the  estimate  of  the  system  state 
at  n,-A.  Also,  note  that  the  optimal  control  cost  is  =  i^(0)P(0)i(0)  where  P(0)  is 

found  from  the  above  procedure. 

To  prove  the  optimality  of  this  control  law  we  need  the  following  lemmas. 

Lemma  1  If  Q  is  positive  semi-definite  and  R  is  positive  definite,  then  P{i),  i  =  v,  v—1,  v—2. ....  0, 
matrices  are  positive  semi-definite.  Hence,  P{i)s  are  symmetric  from  the  definition  of  a  positive 
semi- definite  matrix. 


Proof  Since  P[y)  =  Q  ,  from  assumption  P(j/)  is  positive  semi -definite.  Assume  that  for 
Jt  =  i  -r  1,  P{k)  is  positive  semi-dehnite.  We  use  induction  to  prove  that  P(i)  is  semi-definite.  Note 
that  Q  is  positive  semi-definite  and  R  is  positive  definite.  From  (  22)  we  have 

P(0  =  [An.^,_„.-Pn.-^,_n,-ir(OfP(i+l)  (24) 

n,-  ~  -^rw+i— 

+  Q 

+  \Ai-B,K{:)fQ\Aj-B,K[i)] 

-F  (n,-+a  -  ni)K'^{i)RK{i) 
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Since  P{i  +  1)  and  Q  are  positive  semi-definite 
is  easy  to  verify  that  for  Vy  6  J?"’  :  P{i)y  >  0. 

This  inductive  procedure  proves  the  lemma. 


,  R  is  positive  definite,  and  (n,+i  -  n.)  >  0,  it 
This  means  that  P{i)  is  positive  semi-definite. 


Lemma  2  Given  the  inverse  mairix  in  (21)  always  exists. 


Proof  Let  P(v  —  m  -4-  I'lP  j.  dT/o  t> 

-n„_„  Tnt  -n„_„  +  Bj  QBj  4- 

-  n^^m)R-  From  Lemma  1,  P(v  -  m  -i- 1)  is  positive  semi-definite.  Therefore,  Vy  £  R^  : 

y  V  y  >  0  because  Q  is  positive  semi-definite,  R  is  positive  definite  and  -  n^_„,  >  0.  This 

implies  that  V  is  positive  definite.  Hence  the  inverse  matrix  exists. 


Theorem  1  Given  D^,  R  (i)  (i  _  0, 1,2,  ...,i'-i;  obtained  from  the  above  procedure  are  the  optimal 
feedback  gains  which  minimize  the  cost  function  (and  on  [0,AfA]. 

Proof  Note  that  given  Du,  Jm  is  a  convex  function  of  u(n;),:  =  0, 1,..., r/-  1.  Thus  the 
above  feedback  control  law  is  optimal. 


Lemma  3  If  p  <  q  and  Dp  C  D^  ,  then  where  and  are  the  optimal  costs  of 

controls  which  change  controls  at  time  instants  in  Dp  and  Dg  respectively. 

Proof  Suppose  that  then,  in  controlling  the  system  with  Dg,  if  we  do  not 

change  controls  at  time  instants  in  Dg  -  Dp  and  change  controls  at  time  instants  in  Dp  to  the  same 
control  inputs  that  were  exercised  to  get  with  Dp,  we  obtain  which  is  equal  to  This 
contradicts  the  fact  that  is  the  minimum  cost  obtainable  with  Dg  since  we  have  found  Jm, 
which  is  equal  to  and  therefore  less  than  Hence,  . 

This  lemma  implies  that  if  we  do  not  take  computation  cost,  p,  into  consideration,  then  the 
more  control  exercising  points,  the  better  the  controller  is  (less  cost).  With  the  computation  cost 
bdng  included  in  the  cost  function,  the  statement  above  is  no  longer  true.  Therefore  we  need  to 
search  for  an  optimal  Du  which  minimizes  the  cost  function  J'j^.  The  follovfing  sections  provide  a 
detailed  discussion  on  searching  for  such  an  optimal  solution.  Note  that  if  we  let  Du  =  Dm  then 
the  optimal  temporal  control  law  is  the  same  as  the  traditional  linear  feedback  optimal  control  law. 
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3.3  Optimal  Temporal  Control  Law  over  Space  with  u  Given 

When  the  number  of  control  changing  points,  v,  and  an  initial  system  state  x(0)  are  given,  we 
search  over  a  set  of  possible  D^s  and  u{Dt,)s  such  that  the  cost  function  is  minimized.  This 
can  be  done  by  varying  —  1  control  changing  time  instants,  t,-,  i  =  1,2,  1  (since  to  =  0) 

over  the  discrete  set,  Dm  =  {0,  A,2A, . . .,  {M  —  1)A}  and  applying  the  technique  developed  in  the 
previous  section  for  each  given  Du-  Let  us  denote  such  a  Du  which  minimizes  Jm  as  Dl-  Note 
that  when  v  is  given,  minimizing  Jm  is  equivalent  to  minimizing  j]u[-  Since  both  Du  and  u{Du) 
are  control  variates,  to  be  able  to  find  a  global  optimal  solution,  either  an  exhaustive  search  or 
some  global  search  methods  like  Genetic  Algorithm  or  Simulated  Annealing  should  be  considered, 
ti.'tcr  we-present  &  nuineric^  ©cample,  m  viHhich  an  exhaustive  search  with  Sieepest  Descent  Search 
method  is  used.  Searching  for  a  globally  optimal  solution  for  a  temporal  controUer  calls  for  further 
research. 

3.4  Optimal  Temporal  Control  Law 

Assume  that  a  maximum  number  of  control  changing  points,  Umax-,  is  given.  By  varying  u  from 
1  to  Umax  we  can  find  to  obtain  a  globally  optimal  temporal  controller  which  minimizes  J'j^. 
This  can  be  done  by  first  searching  for  D^  for  each  given  u  and  then  comparing  the  cost  function 
J'm  =  a-t  ea-ch  Dl,  u  =  1,2,. .  ..Umax-  That  is,  let  J'm,,  —  x^(0)P(0)x(0) -f  up.  where 

P(0)  is  calculated  at  Dl  as  in  the  previous  section.  Then  we  can  obtain  a  global  minimum  cost 
j'm  =  ^  optimal  number  of  control  changes,  u°,  at  which  J'm^  =  J'm- 

3.5  Terminal  State  Constraints 

The  terminal  state  constraints  may  be  used  to  check  if  the  optimal  temporal  controller  with  Dl- 
can  drive  the  system  state  to  a  permissible  final  state  within  a  given  time.  Let  Xj  be  a  set  of 
allowed  terminal  states,  if  £  Xj,  then  the  control  law  is  said  to  be  stable  in  terms  of  the 

terminal  state  constraints  and  not  stable  if  x{nu)  ^  Xj.  IS  the  globally  optimal  temporal  controller 
obtained  from  the  above  procedure  is  not  stable,  u’  should  be  increased  until  a  stable  one  is  found. 
One  way  of  specifying  terminal  state  constraints  for  regulators  might  be  |  x{M)i  [<  £,•  where  x{M)i 
is  the  :th  element  oi  x{M)  state  vector. 
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3.6  Algorithm  to  Derive  an  Optimal  Temporal  Controller 

To  summarize  the  above  discussion,  we  provide  in  Figure  2  a  complete  algorithm  to  search  for  a 
globally  optimal  temporal  controller  under  the  assumption  that  the  initial  state  i(0)  is  given. 

In  the  algorithm,  a  neighbor  of  Di,  =  {noA,  n-j  A,  n^A, . . . ,  n„_)  A}  is  defined  to  be  any  member 
of  aset  =  {{noA,n'A,...,n|,_iA}  I  |  n- -  n.- |  <  1,  i  =  1,2, . . .,  v  -  1}. 

3.7  Optimal  Temporal  Controllers  over  an  Initial  State  Space 

Note  that  D®  might  become  different  if  a  new  initial  system  state  x(0)  is  used  instead  of  x(0)  when 
the  state  vector  is  in  where  m  >  2.  This  is  because  the  cost  function  Jm  =  i^(0)P(0)i(0) 

depends  on  i(0)  as  well  as  P(0).  Thus,  D®  is  dependent  on  the  initial  state  2(0).  However,  when 
m  =  1  it  can  be  shown  that  £>°  is  independent  of  any  initial  state.  To  see  this  let  i(0)  =  ici(O)  €  7^^ 
and  P(0)  and  P(0)  be  the  optimal  matrices  with  initial  states  2(0)  and  i(0),  respectively,  i.e.. 


^A^(2(0))  =  2(0)P(0)2(0) 
'^A^(2(0))  =  i(0)P(0)x(0) 

From  the  optimality  of  P(0)  with  respect  to  2(0), 

2^(0)P(0)2(0)  >  f^(0)P(0)2(0) 
Multiplying  the  above  inequality  by  we  have 

fc^2^(0)P(0)2(0)  =  2^(0)P(0)x(0) 

>  I-^2^(0)P(0)2(0) 
=  2^(0)P(0)2(0) 

On  the  other  hand,  due  to  the  optimality  of  P(0)  we  liave 


(25) 


(26) 


2^(0)P(0)2(0)  >  2^(0)P(0)2(0)  (27) 

Therefore,  P(0)  =  P(0).  This  implies  the  optimality  of  P(0)  and  X)®  for  any  initial  state 

2(0)  € 

Generally  speaking,  the  above  result  will  not  hold  for  m  >  2  cases.  However,  using  the  same 
argument  discussed  above  we  can  prove  that  for  any  initial  state  2(0)  =  1:2(0),  2(0)  and  2(0)  wiD 
have  the  same  D®  as  well  as  the  same  P(0). 
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1/°  =  1 
j;j  =  oo. 

for  i/  =  1  to  Umax  { 

/*  Several  different  search  starting  points  */ 
for  j  =  1  to  NumJnitPiSi,  { 

/*  Iterate  until  a  local  minimum  is  found  -  Steepest  Descent  Search  */ 
while  (MinimumFound  !=  True)  { 

Find  optimal  costs  for  neighboring  points  of  using  theorem  1 
if  (4  has  a  Local  Minimum  at  D^) 
then  { 

MinimumFound  =  True 
=  Cost(  at  } 

else 


=  a  neighbor  of 


with  the  smallest  Jj^ 


then  { 


Figure  2;  Complete  algorithm  to  find  an  optimal  temporal  controller. 
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4  Implementation 


To  implement  temporal  control,  we  need  to  calculate  and  store  l{{i)  matrices  in  (  22)  and  use  them 
when  controlling  the  system  utilizing  (  23).  Note  that  in  traditional  optimal  linear  control  a  similar 
matrix  is  obtained  and  used  at  every  time  instant  in  Dm  to  generate  control  input  value.  While 
the  feedback  gain  matrices  for  traditional  linear  optimal  controller  are  independent  of  initial  states, 
the  number  of  control  exerases,  i/,  and  K(i)  matrices  are  dependent  on  initial  states  for  temporal 
control  systems.  But,  if  the  possible  set  of  initial  states  is  in  they  are  independent  of  the  initial 
states.  Effective  deployment  of  temporal  control  requires  that  we  know  the  range  of  initial  state 
values  and  generate  K{i)  matrices  for  each  group.  A  sensitivity  analysis  is  required  to  determine 
how  many  distinct  matrices  need  to  be  stored. 

In  order  to  implement  temporal  control  we  require  an  operating  system  that  supports  scheduling 
control  computations  at  specific  time  instants.  The  Maruti  system  developed  at  the  University  of 
Maryland  is  a  suitable  host  for  the  implementation  of  temporal  control  [10,  8,  7].  In  Maruti,  all 
executions  are  scheduled  in  time  and  the  time  of  execution  can  be  modified  dynamically,  if  so 
desired.  This  is  in  contrast  with  traditional  cyclic  executives  often  used  in  real-time  systems,  which 
have  a  fixed,  cyclic  operation  and  which  are  well  suited  only  for  the  sampled  data  control  systems 
operating  in  a  static  environment.  It  is  the  availability  of  the  system  such  as  Maruti  that  allows 

us  to  consider  the  notion  of  temporal  control,  in  which  time  becomes  an  emergent  property  of  the 
system. 


5  Example 

To  illustrate  the  ad-v-antages  of  a  temporal  control  scheme  let  us  consider  a  simple  example  of  rigid 
body  satellite  control  problem  [12].  The  system  state  equations  are  as  follows: 


y{k) 


0  1  ' 

i(h)  + 

0 

-1  2 

0.00125 

1  1  ]  x{k) 

u{k) 


where  k  represents  the  time  index  and  one  unit  of  time  is  the  discretized  subinterval  of  length 

A  =  0.05.  The  Hnear  quadratic  performance  index  ^  in  (  5)  is  used  here  wdth  the  following 
parameters. 


1  0 
0  1 
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Figure  3:  OptimaJ  Linear  Control  with  A  =  0.05. 

R  =  0.0001 

M  =  0.02  &  0.01 

M  =  40 
A  =  0.05 

=  0.01,  :  =  1,2 

(28) 

The  objective  of  the  control  is  to  drive  the  satellite  to  the  zero  position  and  the  desired  goal 
State  is  xj  =  [0,  0]^.  The  tenainaJ  state  constraint  is  [  Z{(40)  |<  £,•  i  =  1,2.  With  the  equal 
sampling  interval  A  =  0.05  and  I\d  =  40  the  optimal  linear  feedback  control  of  this  system  has  cost 
function  J}^  =  0.984678  (without  computational  cost)  and  =  1.784678  (with  computational 
cost)  and  is  shown  in  Figure  3.  The  terminal  state  constraint  is  satisfied  at  O.Ssec. 

If  we  apply  the  temporal  control  scheme  presented  above  to  this  problem  with  fi  =  0.02  we  find 
that  the  optimal  number  of  control  changes  for  this  example  is  3  and  I?|  =  {0,2A,  lOA}  with  a 
cost  ^  1.08388.  Note  that  the  40  step  optimal  linear  feedback  controller  given  above  has  a  cost 

^  1.784678  when  computation  cost  is  considered.  Table  1  shows  how  this  optimal  controller 
is  obtained  when  we  set  i/mcr  =  7.  Figure  4(a)  shows  the  system  trajectory  when  this  three-step 
optimal  temporal  controller  is  used  to  control  the  system.  This  trajectory  satisfies  the  terminal 
state  constraint  at  O.Ssec  as  well.  Also,  the  maximum  control  input  magnitudes,  |  u  Imax^  iii  both 
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n 

Dl 

Cost(J^)  with  =  0.02 

Cosl{j'f^)  with  fx  =  0.01 

1 

{0} 

4.63089  + /X  =  4.65089 

4.63089  + /X  =  4.64089 

2 

{0,1} 

1.44603+  2/x  =  1.48603 

1.44603  +  2/x  =  1.46603 

3 

{0,2,10} 

1.02388+  3/x  =  1.08388 

1.02388+  3/x  =  1.05388 

4 

{0,2,9,11} 

1.02224 +  4/x=  1.10224 

1.02224 +  4/x=  1.06224 

5 

{0,1,3,8,11} 

0.996968+  5/x  =  1.096968 

0.996968+  5/x  =  1.046968 

6 

{0,1,3,8,11,24} 

0.996746+  6/x  =  1.116746 

0.996746+  6/x  =  1.056746 

7 

{0,1,3,8,11,23,25} 

0.996745  +  7/x  =  1.136745 

0.996745+  7/x  =  1.066745 

Table  1:  Calculating  optimal  temporal  controllers. 

controllers  lie  within  the  same  bound  5  =  50,  which  may  be  another  constraint  on  control. 

The  optimal  temporal  controller  found  with  /x  =  0.01  has  i/  =  5  and  =  {0,  A,3A,8A,11A} 
with  a  cost  =  0.996968.  Note  that  this  cost  is  even  less  than  1.01269  which  is  obtained  from 
the  optimal  controller  with  equal  sampling  period  O.lsec  and  20  control  changes. 

If  we  change  control  values  only  at  three  time  instants  with  equal  sampling  period,  13M  = 
0.65sec,  the  total  cost  incurred  is  2.2823(without  computational  cost)  on  the  time  interval  [0,2]. 
The  cost  is  more  than  twice  that  of  our  optimal  temporal  controller  and  the  terminal  state  constraint 
is  not  satisfied  even  at  the  end  of  the  controlling  interval  of  2.0sec.  Figure  4(b)  clearly  shows  the 
adv^tages  of  using  an  optimal  temporal  controller  over  using  an  optimal  controller  of  equidistant 
samplings.  Their  performances  are  noticeably  different  though  both  of  them  are  changing  controls 
at  three  time  instants.  It  is  clear  that  the  optimal  temporal  control  with  three  control  changes 
performs  almost  the  same  as  40  step  linear  optimal  controller  does.  This  implies  that  enforcing  the 
constant  sampling  rate  throughout  the  entire  controlling  interval  may  simply  waste  computational 
power  which  otherwise  could  be  used  for  other  concurrent  controlling  tasks  in  critical  systems. 

Obtaining  D|  for  this  example  was  simple  since  J^o  has  only  one  minimum  over  the  entire  set 
of  possible  D3S  on  [0,40A].  Figure  5(a)  and  Figure  5(b)  show  that  J40  ias  only  one  local(global) 
minimum  at  i?|  =  {0,2A,  lOA).  We  got  this  optimal  P3  by  doing  steepest  descent  search  with  the 
starting  point  =  {0,  A,  lOA}  after  searching  for  only  three  points,  {O,  A,  lOA),  {0,2A,  lOA), 
■{0,3A,10A).  Also,  Figure  5(a)  shows  that  choosing  ni  has  greater  influence  on  the  total  cost  than 
712  since  the  cost  varies  more  radically  along  the  tij  axis  in  the  figure.  This  means  that  the  initial 
stage  of  the  control  needs  more  attention  than  the  later  stage  in  this  linear  control  problem. 

But,  if  we  change  one  of  the  parameters  of  performance  index  function,  R,  from  0.0001  to  0.001 
we  get  two  local  minima  at  Dl  =  {0,A,2A}  and  i)|  =  {0,3A,19A},  among  which  Dl  is  the 
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(a) 


(b) 


Figure  4:  Control  trajectories  with  3  control  changes.  (a)Optimal  temporal  control  with  D 
{0,2A,10A}.  (b)OptimaJ  linear  control  with  13A  (O.Sosec)  period. 
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OfO 


Figure  6:  Costs  near  Dl  and  Dl  with  R  —  0.001. 

optimal  one  with  less  cost.  Figure  6  shows  this  fact.  In  this  caise  we  need  to  use  steepest  descent 
search  method  at  least  twice  with  different  search  starting  points  to  get  an  optimal  solution.  We 
implemented  this  steepest  descent  search  algorithm  in  Mathematica  and  used  it  to  generate  for 
several  examples  by  varying  v.  For  our  examples  of  linear  time  invariant  system  control  problems 
the  number  of  local  minima  was  not  so  large  that  we  could  efhdently  apply  this  search  method 
just  a  few  times  with  different  initial  to  get  a  global  minimum  without  doing  an  exhaustive 

search  over  the  entire  space. 

6  Discussion 

Employing  the  temporal  control  methodolog}’  in  concurrent  real-time  embedded  systems  wiD  have 
a  significant  impact  on  the  way  computational  resources  are  utilized  by  control  tasks.  A  minimal 
amount  of  control  computations  cam  be  obtained  for  a  given  regulator  by  w'hich  we  can  achieve 
almost  the  same  control  performance  compared  to  that  of  traditional  controller  with  equal  sampling 
period.  This  significantly  reduces  the  CPU  times  for  each  controlling  task  and  thus  increases  the 
number  of  real-time  control  functions  which  can  be  accommodated  concurrently  in  one  embedded 
system.  Particularly,  in  a  hierarchical  control  system  if  temporal  controllers  can  be  employed  for 
lower  level  controllers  the  higher  level  controllers  will  have  a  great  degree  of  flexibility  in  managing 
resource  usages  by  adjusting  computational  requirements  of  each  lower  level  controller.  For  example, 
in  emergency  situations  the  higher  level  controDer  may  force  the  lower  level  controller  to  run  as 
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infrequently  as  they  possibly  can  (thus  freeing  computational  resources  for  handling  the  emergency). 
In  contrast,  during  normal  operations  the  temporal  control  tasks  may  run  as  necessary,  and  the 

additional  computation  time  can  be  used  for  higher  level  functions  such  as  monitoring  and  planning, 
etc. 

In  addition,  the  method  developed  in  Section  3.2,  which  calculates  an  optimal  controller  when 
control  changing  time  instants  are  given,  can  be  applied  to  the  case  in  which  the  control  computing 
time  instants  cannot  be  periodic.  For  example,  when  a  small  embedded  controller  is  used  to 
control  several  functions,  it  may  be  a  lot  better  to  design  a  temporal  controller  for  each  function 
such  that  the  required  computational  resources  are  appropriately  scheduled  while  retaining  the 
required  degree  of  control  for  each  function. 

7  Conclusion 

In  this  paper  we  proposed  a  temporal  control  technique  based  on  a  new  cost  function  which  takes 
into  account  computational  cost  as  well  as  state  and  input  cost.  In  this  scheme  new  control  input 
values  are  defined  at  time  instants  which  are  not  necessarily  regularly  spaced.  For  the  Hnear 
control  problem  we  showed  that  almost  the  same  quality  of  control  can  be  achieved  while  much  less 
computations  are  used  than  in  a  traditional  controller. 

The  proposed  formulation  of  temporal  control  is  likely  to  have  a  significant  impact  on  the 
way  concurrent  embedded  real-time  systems  axe  designed.  In  hierarchical  control  environment, 
this  approach  is  likely  to  result  in  designs  which  are  significantly  more  efficient  and  flexible  than 
Traditional  control  schemes.  As  it  uses  less  computational  resources,  the  lower  level  temporal 
controllers  wiU  make  the  resources  available  to  the  higher  level  controllers  without  compromising 
the  quality  of  control. 
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Abstract 

The  real-time  systems  differ  from  the  conventional  systems  in  that  every  task  in  the  real¬ 
time  system  has  a  timing  constraint.  Failure  to  execute  the  tasks  under  the  timing  constraints 
may  result  in  fatal  errors.  Sometimes,  it  may  be  impossible  to  execute  all  the  tasks  in  the  task 
set  under  their  timing  constraints.  Considering  a  system  with  limited  resources,  one  solution 
to  handle  the  overload  problem  is  to  reject  some  of  the  tasks  in  order  to  generate  a  feasible 
schedule  for  the  rest.  In  this  paper,  we  consider  the  problem  of  scheduling  a  set  of  tasks  without 
preemption  in  which  each  task  is  assigned  criticality  and  weight.  The  goal  is  to  generate  an 
optimal  schedule  such  that  all  of  the  critical  tasks  are  scheduled  and  then  the  non-critical  tasks 
are  included  so  that  the  weight  of  rejected  non-critical  tasks  is  minimized.  We  consider  the 
problem  of  finding  the  optimal  schedule  in  two  steps.  First,  we  select  a  permutation  sequence 
of  the  task  set.  Secondly,  a  pseudo-polynomial  algorithm  is  proposed  to  generate  an  optimal 
schedule  for  the  permutation  sequence.  If  the  global  optimal  is  desired,  all  permutation  sequences 
have  to  be  considered.  Instead,  we  propose  to  incorporate  the  simulated  annealing  technique  to 
deal  with  the  large  search  space.  Our  experimental  results  show  that  our  algorithm  is  able  to 
generate  near  optimal  schedules  for  the  task  sets  in  most  cases  while  considering  only  a  limited 
number  of  permutations. 
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1  Introduction 


RccQ-time  computer  systems  are  essential  for  all  embedded  applications,  such  as  robot  control,  flight 
control,  and  medical  insimmentalion.  In  such  systems,  the  computer  is  required  to  support  the 
execution  of  applications  in  which  the  timing  constraints  of  the  tasks  are  specified  by  the  physical 
system  being  controlled.  The  correctness  of  the  system  depends  on  the  temporal  correctness  as 
well  as  the  functional  correctness  of  the  tasks.  Failure  to  satisfy  the  timing  constradnts  can  incur 
fatal  errors.  How  to  schedule  the  tasks  so  that  their  timing  constraints  are  met  is  crucial  to  the 
proper  operation  of  a  real-time  system. 

As  an  example  of  an  embedded  system,  let  us  consider  the  air  defense  system  which  monitors 
an  air  space  continuously  using  radars.  Whenever  an  intruder  is  identified,  the  embedded  control 
system  characienze$  it  and  proceeds  to  initiate  the  responsive  action  in  a  timely  manner.  The 
temporal  constraints  for  this  phase  of  processing  axe  different  depending  on  the  intruder,  whether 
it  is  a  missile,  a  fighter,  a  bomber,  a  dummy,  etc.  Such  a  system  is  designed  to  handle  a  number  of 
intruders  concurrently.  If  the  processing  requests  exceed  the  capacity  of  the  system,  we  expect  the 
system  to  handle  a  set  of  the  most  significant  intruders,  and  not  any  arbitrary  set  of  intruders.  This 
involves  rejecting  the  processing  ol  some  rcftl-iime  tasks  based  on  their  importance.  In  this  paper, 
we  consider  the  problem  of  creating  a  schedule  for  a  set  of  tasks  such  that  all  critical  tasks  axe 
scheduled,  and  then,  among  the  non-critical  tasks  we  select  those  which  can  be  scheduled  feasibly 
while  maximizing  the  sum  of  the  weights  of  selected  non-critical  tasks. 

As  all  systems  have  finite  resources,  their  ability  to  execute  a  set  of  tasks  while  meeting  the 
temporal  requirements  is  limited.  Clearly,  overload  conditions  may  arise  if  more  tasks  have  to  be 
processed  than  the  a\*ailable  set  of  resources  can  handle.  Under  such  overload  conditions,  we  have 
two  choices.  We  may  augment  the  resources  available,  or  reject  some  tasks  (or  both).  In  [8],  a 
technique  was  presented  to  handle  transient  overloads  by  taking  advantage  of  redundant  computing 
resources.  Another  permissible  solution  to  this  problem  is  to  reject  some  of  the  tasks  in  order  to 
generate  a  feasible  schedule  for  the  rest.  Once  a  task  is  accepted  by  the  system,  the  system  should 
be  able  to  finish  it  under  its  timing  constraint.  Some  algorithms  may  have  been  shown  to  perform 
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well  under  low  or  moderate  resource  utilization.  However,  their  performance  degrades  if  the  system 
is  overloaded  [2].  For  example,  the  EDF  algorithm  has  been  shown  to  be  optimal  for  a  periodic  task 
set  [6],  If  there  exists  a  feasible  schedule  for  the  task  set,  EDF  can  come  up  with  one.  However, 
if  the  task  set  is  not  feasible,  EDF  may  perform  unsatisfactorily.  The  reason  is  that  a  task  with 
urgent  deadline  may  not  be  able  to  finish  before  its  deadline.  But,  due  to  its  urgent  deadline,  the 
task  has  a  high  priority  to  use  the  processor  and  thus  keeps  wasting  the  CPU  time  until  the  task 
expires  after  its  deadline.  The  waste  of  CPU  time  may  further  prevent  other  tasks  from  nvceting 
their  deadbnes.  The  other  problem  is  that  there  is  little  control  over  which  tasks  will  meet  their 
deadlines  and  which  will  not. 

For  an  overloaded  system,  how  to  select  tasks  for  rejection  on  the  basis  of  their  importance 
becomes  a  significant  issue.  When  the  tasks  have  equal  weight,  an  optimal  schediile  can  be  defined 
to  be  one  in  which  the  number  of  rejected  tasks  is  minimized.  In  our  previous  study  j3j>  we  used  a 
super  sequence  based  scheduling  algorithm  to  compute  the  optimal  schedule  for  the  tasks.  In  this 
paper,  the  criticality  of  the  tasks  are  taken  into  consideration.  Basically,  if  a  task  can  not  meet 
its  deadline,  it  is  rejected  so  that  the  CPU  time  would  not  be  wasted.  Secondly,  we  would  like  to 
schedule  tasks  such  that  the  less  important  tasks  may  be  rejected  in  favor  of  the  more  important 
tasks.  We  classify  tasks  into  two  categories:  critical  and  non-criticai  The  critical  tasks  are  crucial 
to  the  system  such  that  they  must  not  be  rejected.  The  non-critical  tasks  are  given  weights  to 
reflect  their  importance,  and  are  allowed  to  be  rejected..  A  schedule  is  feasible  if  all  ciitKal  tasks 
in  the  task  set  are  accepted  and  are  guaranteed  to  meet  their  timing  constraints.  If  there  exists 
no  feasible  schedule  for  the  task  set,  the  task  set  is  considler^  infeasible.  The  loss  of  a  schedule  is 
defined,  to  be  the  sum  of  the  weights  of  the  rejected  non-critical  tasks.  A  schedule  is  optimal  if  it 
is  feasible  and  the  loss  of  the  schedule  is  minimum. 

We  first  propose  a  Permutation  Scheduling  Algorithm  (PSA)  to  generate  an  optimal  schedule 
for  a  permutation,  which  is  a  well  defined  ordering  of  tasks.  When  it  comes  to  scheduling  a  task  set 
of  n  tasks,  in  the  worst  case  there  might  be  up  to  n!  permutations  to  consider.  We  propose  a  Set 
Scheduling  Algorithm  (SSA)  which  incorporates  the  simulated  annealing  technique  [9]  to  deal  with 
the  large  search  space  of  permutations.  PSA  is  invoked  by  SSA  to  compute  the  optimal  schedule  for 
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each  permulation.  Taking  the  feedback  from  the  schedulability  and  loss  of  the  schedule  generated 
by  PSA,  SSA  is  able  to  control  the  progress  of  searcli  for  an  optimal  schedule  for  the  task  set.  Our 
experimental  results  show  that  SSA  is  able  to  generate  feasible  schedules  for  task  sets  consisting  of 
100  tasks  with  success  ratios  no  less  than  98%  and  loss  ratios  less  than  10%  for  most  cases  while 
searching  less  than  5, 000  permutations.  For  each  permutation,  the  average  number  of  schedules 
computed  to  generate  an  optimal  schedule  by  PSA,  which  is  invoked  by  SSA,  is  usually  less  than 
500.  The  SSA  algorithm  can  be  considered  efficient  in  dealing  with  the  exponential  search  space 
for  coming  up  with  a  satisfactorily  near  optimal  schedule. 

In  the  following  section,  we  define  the  scheduling  problem.  In  section  3,  we  present  the  idea 
about  how  to  schedule  a  permutation.  In  section  4,  we  incorporate  the  technique  of  simulated 
annealing  and  discuss  how  to  schedule  a  task  set.  In  section  5,  the  results  of  our  experiments  are 
presented,  which  is  followed  by  our  conclusion. 

2  The  Problem 

A  task  set  is  represented  as  T  =  A  task  r,-  can  be  characterized  as  a  record  of 

(r;,c;,<i;,u>,),  representing  the  ready  time,  computation  time,  deadline,  and  criticality  of  the  ith 
task.  Time  is  expressed  as  a  real  number.  A  task  can  not  be  started  before  its  ready  time.  Once 
started,  the  task  must  use  the  processor  without  preemption  for  c,-  time  units,  and  be  finished 
by  its  deadline.  If  a  task  is  very  important  for  the  system  such  that  rejection  of  the  task  is  not 
allowed,  Wi  is  set  to  be  CRITICAL.  Otherwise,  tn,-  is  assigned  an  integral  value  to  indicate  its 
importance,  and  is  subject  to  rejection  if  necessarj'.  A  permutaiion  sequence,  or  simply  abbreviated 
to  a  permutation,  is  an  ordered  sequence  of  tasks  in  the  task  set.  Scheduling  is  a  process  of  binding 
starting  times  to  the  tasks  such  that  each  task  executes  according  to  the  schedule.  Note  that  a 
non-preemptive  schedule  on  a  single  processor  implies  a  sequence  for  the  execution  of  tasks.  For  the 
convenience  of  our  discussion,  we  hereafter  use  a  sequence  to  represent  the  schedule  in  the  context. 
A  permutation  is  denoted  by  {ti, ..  /,Tp),  where  tj-  is  the  ith  task  in  the  permutation.  A  prefix 
of  a  permutation  is  denoted  by  pk  =  (t’i,  •  •  •  ,7^). 
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To  schedule  a  task  set,  we  need  to  take  into  consideration  the  possible  permutations  in  the  task 
set.  We  first  consider  an  algorithm  for  scheduling  a  permutation.  The  finish  time  of  a  schedule  is 
the  finish  time  of  the  last  task  in  the  sdredule.  Let  Sk{i)  denote  a  schedule  of  ftk  with  finish  time 
no  more  than  1.  We  use  W(5fc(t))  to  represent  the  weight  of  Sk{i),  which  is  the  sum  of  the  weights 
of  non-criticaJ  tasks  i.n  the  schedule.  A  feasible  schedule  of  p*  is  defined  as  foDows; 

Definition:  Sk{t),  1  <  k  <  n,  is  z.  feasible  schedule  of  fik  a-t  if  a^d  only  if: 

1.  Sk{i)  is  a  subsequence  of  /r*, 

2.  the  finish  time  of  Sk{i)  is  less  than  or  equal  to  t,  and 

3.  all  critical  tasks  in  fik  are  included  in  Sk{t). 

An  optimal  schedule  of  fik  is  defined  as  follows: 

Definition:  (7k{t)  is  an  optimal  schedule  of  pk  at  i,  if  and  only  if; 

1.  Ok{i)  is  a  feasible  schedule  of  pk^  and 

2.  for  any  feasible  schedule  Sk(t)  of  pk,  ^ 

In  other  words,  an  optimal  schedule  is  a  feasible  schedule  with  minimum  loss.  There  are  possibly 
more  than  one  optimal  schedules  for  pk  wdth  finish  time  less  than  or  equal  to  i.  We  donote  by 
£i(t)  the  set  of  all  of  the  optimal  schedules  for  pk  at  i.  Hence,  if  Sk{t)  €  Sk{i)  is  an  optimal 

schedule  for  pk  at  i. 

The  scheduling  problem  considered  here  is  NP-complete.  To  prove  that,  its  related  decision 
problem,  which  is  defined  to  be  computing  a  feasible  schedule  with  loss  no  more  than  a  given 
bound,  can  be  easily  shown  to  be  NP-complete.  This  can  be  done  by  restricting  to  PARTITION 
problem  [1]  by  setting  r,-  =  0,u),'  =  Ci,di  =  |  Cj,  for  1  <  :  <  n. 

3  Scheduling  a  Permutation 

We  consider  the  problem  of  finding  an  optimal  schedule  for  the  task  set  in  two  steps  -  select  a 
permutation,  and  find  an  optimal  schedule  for  the  permutation.  The  methodology  is  presented  in 
Figure  1. 
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Loop  1;  Choose  a  permutation  ^of  F 
Loop  2;  ioT mt,,  #f=5-l,2, 

Loop  3:  compute  otCt) 

Figure  1:  Methodology 

Clearly,  to  find  the  optimal  schedule  for  the  task  set,  all  possible  permutations  have  to  be 
considered.  How  to  search  tiie  permutations  wiD  be  addressed  in  section  4.  In  Loop  3,  optimal 
schedules  for  are  computed  at  some  time  instants.  Next,  we  discuss  how  to  compute  ak{t)  for  a 
given  t  in  the  foUow'ing,  and  then  discuss  how  to  determine  the  time  instants  for  fik- 

3.1  Computing  Ok{i) 

We  use  dynamic  programming  to  compute  crjt(l)  based  on  with  t'  <  t.  The  criticality  of 

Tk  plays  an  important  role  in  computing  ajt(t). 

If  Tk  is  a  critical  task,  we  have  to  schedule  it,  possibly  at  the  cost  of  rejecting  some  of  the 
non-critical  tasks.  Hence,  ak{t)  =  Sk-i{i')  e  rk,  for  some  schedule  where  ©  means 

concatenation  of  the  sequence  and  the  task.  The  finish  time  of  must  be  no  more  than 

1  —  Ci  in  order  to  accommodate  r*,  ■which  leads  to  f*  <  i  —  Ck-  The  best  candidate  could  be 
Ok--i{i-Ck)-  Hence, 

■a-hich  can  be  seen  in  Figure  2.  Note  that  ak{t)  only  exists  for  a  proper  range  of  1.  That  is,  0^(1)  is 
infeasible  when  t  is  beyond  the  proper  range,  e.g.,  i  <  rk-r  Ci,  or  if  —  Ck)  is  infeasible.  The 

range  would  be  considered  in  details  later. 

If  Tk  is  non-critical,  our  concern  is  to  obtain  as  large  a  weight  for  the  schedule  as  possible,  while 
the  critical  tasks  accepted  previously  must  be  kept  in  the  schedule.  Computation  of  crk{t)  is  based 


68 


Pigure  2;  Scheduling  for  n 


OfcW  = 


(2) 


\  «->  W  ,1,^  fcasiViBty  ^Bd  the  »e.6hts 

..h  e.  he  teee  .  Hs^e  .  The  ..  ..e, 

,,  tie  two  cendidete  tchedeles.  ThBt  .s, 

L.h.ewdshttnotethaBOte,odtotheothet. 

3.2  Time  Instants  for  Computing  tO  Md  o»-i(t-«)- 

Fto»  Equations  1  and  2,  the  computation  ®t  *  ’'t^rL  g«tthe  idea  ahoot  hovto  deumnnette 
We  do  not  need  m  look  lot  ail  times,  computation  times,  deadlines, 

t,.„e  instants  th,  a  simple  example  inF^ 

n  •  v.c  are  eWcrj  to  foe  tasks  in  M3  -  VH , '  2s  3/ 

•  Viedul€&  ^or  Mi 

The  following  scheduler  Tor  Hi 


03(t)  =  infeasible 
03(1)  =  (^3) 
cr3(0  =  (^2, ‘’■3) 

£73(1)  =  (TlsTs) 


W(03(t))  =  0 
W(03(f))  =  ^ 
W(a3(t))  =  10 


{ort<  6 
for  6  ^  f  2-S 
for  7.5  <  1  <  9 
for  9  <  t 
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wi  =  10 


1^2  =  5 


u;3  =  CRITICAL 

Figure  3:  ^3  =  (ti^’'2,’’3) 

In  general,  there  exist  a  number  of  subranges  in  each  of  which  the  schedules  are  exactly  identical, 
which  are  idustrated  in  Figure  4.  We  only  need  to  compute  the  schedules  at  the  time  instants 
which  delimit  the  subranges,  i.e.,  6,7.5,  and  9.  We  call  these  time  instants  scheduling  points.  The 
scheduling  points  can  be  determined  by  the  timing  characteristics  of  the  tasks. 


Figure  4:  Identical  subranges 


3.3  Definition  of  Scheduling  Points 

We  denote  the  jth  scheduling  point  for  pk  by  and  call  j  the  index  of  XkJ-  Hence,  Ok{^k,j)  de¬ 
notes  an  optimal  schedule  for  pk  at  the  scheduling  point  A*  j.  Let  u*  be  the  total  number  of  schedul¬ 
ing  points  at  which  we  need  to  schedule  p-k-  For  simplicity.  A*  denotes  the  set  of  •  •  •  •. 

and  a*  the  set  of  ak(A*.i),ffi(Ajt.2),. .  .,aii(Ai,,J.  The  scheduling  points  are  defined  as  follows. 

Definition:  The  set  of  scheduling  points,  Ai,  is  complete  if  and  only  if: 

1.  for  any  t  <  Ak.i,  is  empty, 

2.  for  any  \kj  <i<  for  i  =  1, . .  -  1,  Ck{Xij)  €  and 

3.  for  any  t  ^  €  XM. 

Note  that  St(t)  being  empty  means  that  there  is  no  feasible  schedule  with  finish  time  less 
than  or  equal  to  t.  And  also  remember  that  Ok{Xkj)  €  Ti(t)  means  that  Ck{>^kj)  is  an  optimal 
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schedule  for  n  <■  The  completeness  of  scheduling  points  indientes  thet  all  «f  the  optimal 
scliedules  at  the  positive  real  time  domain  can  be  represented  by  the  optima]  schedules  c  ^ 
ut  the  scheduling  points.  In  iMUol,  the  set  of  scheduling  points,  As,  is  mmimum,  if  and  only  . 
W(ot.(Asi))  <  for  any  1  <  i  <  ns  -  1.  This  ensures  that  there  does  not  ^st  any 

redundant  scheduling  point  which,  if  removed,  does  not  violate  the  completeness  of  the  scheduhng 
points.  The  sets  of  scheduling  points  that  we  will  discuss  are  complete  and  minimum. 


3.4  An  Example  for  Deriving  Scheduling  Points 

The  values  of  A*  depend  on  the  temporal  relations  between  rs  and  As-,.  The  example  in  Figure  5 
is  used  to  illustrate  the  relations.  We  only  describe  the  idea  of  deriving  scheduhng  points  by  the 
example,  and  will  discuss  in  more  detaiU  later.  Assume  that  there  are  5  scheduhng  points  for 
and  we  consider  to  compute  on  based  on  Ok-).  TWcurrent  task,  la,  may  ha  critical  or  non  critical. 

scheduling  points  for  fj-k-i  '• 


^k- 


1,1 


^fc-1,2  ^k-1,3 


A*-1.5 


time 


scheduling  points  for  fik  ■  ’‘fc  +  >^k-ia  +  Cfc  >^k-i,3  t  Ck 


Figure  5:  Scheduling  Points 

First,  let  us  assume  that  r*  is  critical,  which  means  that  r,  must  be  the  last  task  in  any  feasible 
scheduled  for  Hk-  A  schedule  for  /x.  is  thus  a  schedule  for  f^k-x  concatenated  by  n-  Hence,  the 
optimal  schedules  for  }ik  can  be  computed  by  appending  Tk  to  Ok-xij),  j  =  I,-- •.«*-!•  One 
restriction  is  that  r*  must  be  able  to  execute  duiing  its  time  lyindow,  from  r*  to  dk-  Hence,  the 
scheduling  points  are  +  c^,  j  =  subject  to  the  timing  constraint  of  r,.  In  the 

example,  because  Tk  >  Afc_a.i,  the  first  scheduling  point  is  A^.i  =  +  Cfc.  The  first  and  the  rest 

scheduling  points  are  expressed  in  Equations  3-5.  Notice  that  Xk-iA  -r  >  dk-  Hence,  there  a.. 
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only  3  scheduling  points  for  fik- 


(3) 

(4) 

(5) 


h,i  =  -f  Cfc  arid  cri(At.i)  =  (Afc-i.i )  ®  rjt 
Ajt,2  =  Ajt_i,2  +  Ck  and  ®  Tk 

Ai,3  =  At_i4 Ci-  cri(Ajfe.3)  =  crjfc_i(Ai_i.3)  ®  r* 

On  the  other  hand,  let  us  assume  that  Tk  is  non-criticaJ.  As  a  non-criticaJ  task,  Tk  is  not  necessarily 
included  in  the  schedule  of  fik-  Whether  to  include  Tk  or  not  depends  on  how  much  weight  may  be 
gained  by  including  Tk.  If  n-  is  included  in  the  schedules,  the  new  possible  scheduling  points  for  ^ik 
are  expressed  in  Equations  6-8. 

and  <r^(A^  i)  =  <TA-i(Ajt_i,i)  ©  (6) 

~  +  Cfc  and  cr[.(Ajt^2)  =  £r^._i(Ai_i,2)  ©  “r*  (7) 

A^  3  =  Ai;_i,3  -r  Ck  and  t^k{^k,z)  ~  ©  ’’’k  (^) 

If  Tk  is  not  included,  the  scheduling  points  for  iik  are  A;t-ij,  3  =  1,.  •  -  ,  The  scheduling  points 
for  fik  can  be  derived  by,  first,  merging  and  sorting  A'^  and  A;t_i,  which  gives 

Ai— 1,1,  Ai_i,2,  A'^  j,  A*_i,3,  Afc_i,4,  A't  2,  A^  3,  Ai:_i,5.  (9) 

Then,  the  resultant  array  of  scheduling  points  should  follow  the  rule  that  the  weights  of  the  optimal 
schedules  at  the  scheduling  points  in  the  resultant  array  in  Equation  9  should  be  strictly  increasing. 
We  remove  any  scheduling  point  if  necessary. 

3-5  Deriving  Scheduling  Points 

By  the  example  illustrated  in  Figure  5,  A*  can  derived  from  A^-i  and  t*.  Note  that  a  scheduling 
point  indicates  the  finish  time  of  a  schedule.  K  we  want  to  append  t*  to  <r*-i(Ai_i  j),  Tk  can  not  be 
started  before  Xk-\j.  This  implies  that  A^  can  be  determined  by  the  temporal  relations  between 
Ai_i,  the  finish  times  of  Ok.,  and  the  start  time  of  rjt.  Specifically,  we  need  to  explore  the  temporal 
relations  between  the  earliest  start  time,  r^,  the  latest  start  time,  dk  —  Ck,  of  Tk,  and  the  lower  and 
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upper  bounds  to  be  defined  below.  We  define  the  lower  bound  Lk-i  -  ^he  upper  bound 

=  A;,_i  V*  particular,,,  they  have  the  foUowing  meanings. 

Lu.y  the  largest  time  instant  such  that  there  is  no  feasible  schedule  for  >^ith  ' 
finish  time  less  than  ii-i- 

Uk-\-  the  least  time  instant  such  that  the  ontimaJ  schedule  for  Hk-i  with  finish  time 
greater  than  Uk-\  can  be 

The  .ix  possible  temporal  relations  in  Equations  10-15  can  be  used  to  determine  A,. 

(10) 

dk  -  Ck  <  Lk-1  <  Uk-i 

(11) 

Tk  <  Lk-1  <  -  Ck  <  Uk-1 

Lk-i<Tk<dk-Ck<Uk.i 

Tk<Lk-i<Uk-^<dk-Ck 

(14) 

Lk-i  <rk<  Uk-i  <dk-Ck 

(lo) 

Lk-i  <  Uk-y  <  ’•* 

The  temporal  relations  are  illustrated  in  Figure  6,  and  can  be  summarised  in  three  cas«.  The 
method  lor  constructing  scheduling  points  according  to  the  temporal  relations 
The  correctness  ol  the  method,  i.e.,  the  «>mpleteu®i.and  minimisation  ol  the  schedu  ng  poi  , 

is  verified  later. 

3.5.1  rk  is  Critical 

The  task  rs  must  be  the  last  task  in  any  feasible  schedule  of  w-  Rennember  that  otW  ^  be 
computed  by  Equation  1.  In  the  following,  we  discuss  how  to  derive  the  scheduling  points  for  the 

three  cases.  The  readers  may  refer  lo  the  algorithm  in  section  3.7  for  details- 

Case  1  da  -  cs  <  i.-H  P  is  uo'  ieasMe.  Remember  theb  there  exists  no  feasible  scheuuie  lor 

„a  with  finish  time  less  than  La-i.  dne  to  Ihe  completeness  of  scheduling  poinU,  and  that  *  -  ca 
is  the  latest  start  time  for  ra.  Hence,  pa  is  not  feasible,  and  thus  the  whole  permutation,  p,  is  not 

feasible. 
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Lk-i 

Lk-1 

Lk-i 


(10) 

(11) 

(12) 

(13) 

(14) 

(15) 


-  case  1 

I 

I —  case  2 

» 

-  J 

-  case  3 


Figure  6:  Temporal  relations 

Case  2  (n  <  U.,  <  4  -  «)  or  (is-i  <  r,  <  Ws-,)  :  The  scheduling  points  for  is  the 
setof  Aj-ij  +  Ci,  j  =  subject  to  the  constraints  that  ts  must  start  alter  rs,  and  finish 

before  dk-  Specifically,  \k  can  be  derived  by  Equations  16  and  17. 

Ai,i  =  ma=(Ajc-i.i  +  c*,  Tk  +  cj^) 

Let  Jmin  and  Jma=  denote  the  smallest  and  the  largest  integers  of;  satisfying  Xk,i  <  Afc-ij+ct  <  dk- 
The  rest  of  the  scheduling  points  can  be  computed  by 

Xk,i  =  Xk-lj  -f  Ci,  where  Jmin  <  j  <  O-nd  1  =  j  -  Jmtn  +  2 

Note  that  Vk  =  Jmxx  -  J^in  ^  2.  The  example  given  in  Figure  5  falls  in  this  case. 

Case  3  Uk--i  <  ^k’  there  is  only  one  scheduling  point.  Since  rjt  is  the  earliest  start  time  for  t*, 

the  only  scheduling  point  is  r*  +  c*. 

3.5.2  Tk  is  Non-critical 

Remember  that  ak{i)  can  be  computed  by  Equation  2.  The  non-critical  task  Tk  is  not  necessarily 
included  in  the  schedule  for  Whether  to  include  r*  or  not  depends  on  how  much  weight  may 
be  gained  by  including  Tjt.  Let  us  consider  the  three  cases. 
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Case  1  dk  -  Ck  <  Lk-\'-  do  nothing.  The  latest  start  time  of  Tk  is  less  than  the  lower  bound, 
jLjt-i;  hence,  tu  can  not  be  included  in  any  feasible  schedule.  The  scheduling  points  and  schedules 
for  iik-\  remain  the  same  as  the  scheduling  points  and  schedules  for  fik-  In  our  implementation, 
to  save  time  and  space,  Xk-y  and  At  use  the  same  memory  spaces;  also,  Ok-y  and  at  use  the  same 
memory  spaces.  So  now  At  =  At_i  and  at  =  at-i. 

Case  2  (rt  <  Lk-y  <  dk  -  ct)  or  (Tt-i  <  <  Uk-y)  :  If  Tt  is  included,  the  new  possible 

schedubng  points  for  pk  is  the  set  of  At-i.j  +  ct,  i  =  !,•••,  vt-i ,  subject  to  the  constraints  that  Tk 
must  start  after  rt,  and  finish  before  dt-  Specifically,  the  new  possible  scheduling  points  ,  Aj^,  can 
be  derived  by  Equations  18  and  19. 

A^.i  =  mai(At-i,i  +  ct,rt  +  ct)  (18) 

Let  Jmin  and  Jmax  denote  the  smallest  and  the  largest  integers  of  j  satisfying  A)- ,  <  At-ij  +  ct  <  dt- 
The  rest  of  the  scheduling  points  are 

Xf.  -  —  At— Ij  "i"  Ct,  whCTC  Jmin  ^  j  ^  Jmax  0,nd  i  —  j  ~  Jmin  "t"  2  (1^) 

If  Tt  is  not  included,  the  scheduling  points  for  pk  are  the  old  ones  for  Pk~y]  i-e-> 

At-ij,  J  =  1, • • •, (20) 

It  is  worth  mentioning  that  some  optimal  schedules  may  include  Tt,  and  some  may  not.  The 
scheduling  points,  At,  can  be  derived  by  the  following  two  steps. 

1.  Merge  and  sort  the  two  arrays  of  scheduling  points,  X'^.  and  At_i,  in  Equations  18-20. 

2.  The  resultant  array  of  scheduling  points  should  follow  the  rule  that  the  weights  of  the  optirn?! 
schedules  at  the  sched tiling  points  should  be  strictly  increasing.  We  remove  any  scheduling 
point  that  has  a  smaller  weight  than  that  of  its  preceding  scheduling  point  in  the  array. 

The  example  given  in  Figure  5  falls  in  this  case. 

Case  3  Uk-y  <  Tk'.  add  one  more  scheduling  point.  The  earliest  start  time  of  Tk  is  greater 
than  the  upper  bound,  Uk-y\  hence,  the  new  scheduling  point  is  r*  +  Cfc.  The  weight  of  the 
optimal  schedule  computed  at  this  scheduling  point  is  l^(ajt-i(Ajt-i,v*_,))+ti^fc>  which  is  larger  than 
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So  this  scheduling  point  must  be  included  to  make  the  set  of  scheduling  points 
for  complete.  Note  again  that  the  scheduling  points  and  schedules  for  remain  unchanged 
as  the  scheduling  points  and  schedules  for  /z*;  i.e.,  a.nd  Ok{^kj)  =  (^k-i{^k-ij),  for 

j  =  However,  Xk,vk  =  ^k  +  Ck  and  ai(Ajt,vJ  =  crjt_i(Ajt_i,„^_,)©rjt,  where  v*  =  v^-i  +  l. 

3.6  Completeness  and  Minimization  of  Scheduling  Points 

We  would  like  to  show  that  the  sets  of  scheduling  points  derived  in  the  three  cases  are  complete 
and  minimum.  Note  that  cases  1  and  3  are  special  cases,  and  are  not  difficult  to  verify.  Hence,  we 
will  only  briefly  discuss  case  2.  If  r/,.  is  critical,  we  would  like  to  show  that  If  Ajt-j  is  complete  and 
minimum,  Xk  derived  by  Equations  16  and  1?  is  also  complete  and  minimum. 

Condition  1  of  completeness:  Due  to  the  completeness  of  Xk-i^  l^k-iii)  is  empty  when  t  < 
A;t-i,i-  Equivalently,  —  Ck)  is  empty  when  t  <  Xk-i,i  +  c^.  According  to  Equation  1, 

<^k(i)  =  Ok-i(i  -  Ck)  ©  Tk.  Hence,  ak{i)  does  not  exist  when  i  <  Xk-i,i  +  cjt.  On  the  other  hand, 
since  Tk  is  critical,  cTk{i)  does  not  exist  when  t  <  +  Ck,  which  is  the  earliest  finish  time  of 
Tk-  Therefore,  is  empty  when  i  <  A*,!-  This  shows  that  condition  1  of  the  definition  of 

completeness  is  satisfied. 

Condition  2  of  completeness:  Due  to  the  completeness  of  A;:_|,  a;t-j(Ajt-i  j)  €  Ej£_i(t),  for  any 
^k-\j  <  i  <  Xk-ij^y.  By  Equation  1,  Ck-i{Xk-ij)  ©  r*  is  an  optima]  schedule  at  A/t-ij  +  Ck 
for  ^k-  Hence,  ak-:iXk-ij)  6  n  e  Tfc(t),  for  A*_ij  +  c;:  <  t  <  Xk-ij+:  +  Ck-  By  Equation  17, 
Xk,i  =  A*_ij  ©  Ck,  for  i  =  j  -  Jmin  +  2,  which  indicates  that  ajt(Ait,,)  =  Ck-\{Xk-\j)  ©  n.  Besides, 
A;c,.+i  =  A*_i  j+i  +  cjt,  for  i  +  1  =  J  +  1  -  J^in  +  2,  by  Equation  17.  Therefore,  crk{Xk,i)  €  £*(0> 
for  Xk,i'<  i  <  Ai.,'4i.  This  shows  that  condition  2  of  the  definition  of  completeness  is  satisfied. 

Condition  3  of  completeness:  We  know  that  vk  =  Jmtx  —  Jmin  +2.  By  Equation  17,  Aje^v*  = 
Ajc_i,j,„,,  +  Cft,  which  indicates  that  ak{Xk,v^)  =  Ok-i{Xk-i,j„,,^)  ©  Tk.  Due  to  the  completeness 
of  Xk-i,  ok-i{Xk.ij„,„)  €  ^k-i{i),  for  A;t_j,j^„  <  i  <  Afc_i,j„„+i,  or  just  <  i  if 

Jmax  =  Vk-i-  By  Equation  1,  ®  Tk  is  an  optimal  schedule  at  Ajt-i,j„.,  +  c* 

for  /ii.  Hence,  (Ait_i,j,„,^)  ©  T;t  €  Eit(t),  for  +  Ck  <  i.  Note  that  the  range  of 


t  <  removed.  Because  J^ax  is  the  largest  integer  of  j  satisfying  j  +  cjr  <  d*, 

the  schedule  would  not  be  feasible.  Sinc'-  ffk(^k,v^)  =  ® ’’Jti 

cr<:(Ajt,v*)  €  £jc(t)  for  Xk,vk  ^  This  shows  that  condition  3  o{  "UtC.  definition  of  completeness  is 
satisfied. 

Minimization;  By  Equation  1,  l^(<rfc(t))  =  W'^((r/:_i(t  —  c*)  ®  r*)  =  W{ak-i{i  -  tjt,)),  Since  a 
critical  task  has  no  weight.  Because  A;t-i  is  minimum,  Vf'^(ajt_i(A/ij_j_j))  <  W(a>_i(Aj(..f^;+i)), 
for  any  1  <  j  <  vk-\  -  1.  That  is,  l'V(ai_i(A*_ij)  ®  T;t)  <  W(tr*_i(Ai._ij+i)  ®  n-),  for  any 
1  <  i  <  vt-i  -  1-  By  Equations  16  and!  17,  W{<JkiXk-i,j  +  cjt))  <  H'^(crfc(Ai._i_j+i  +  c^)),  and  thus 
lf'^(ojt(Ax-_,))  <  W(at(Ajt,,+i)),  for  any  1  <  i  <  u;t  —  1-  This  shows  that  \k  is  minimum. 

If  Tk  is  non-critical,  Tk  may  be  included  or  not  included  in  the  optimal  schedules  for  fik-  Assuming 
that  Tk  is  not  included  in  any  of  the  optimal  schedules,  Xk  =  ^k-i  is  complete,  since  Xk-i  is 
complete.  HoweVef)  including  Tk.  some  more  weight,  so  we  also  need  to  consider  the 

schedules  including  tk-  If  tk  k  included  in  the  optimal  schedules,  X'f.  derived  by  Equations  18  and 
19  is  the  complete  set  of  scheduling  points  for  the  optimal  schedules  including  Tk,  by  the  same 
reason  described  for  the  critical  task.  Hence,  it  is  sufficient  to  construct  the  complete  set  of  Xk 
by  selecting  from  AJ^  and  Aje_i.  Since  whether  to  include  Tk  or  not  does  not  affect  the  feasibility 
of  the  schedules,  we  only  need  to  consider  the  weights  of  the  optimal  schedules.  A  complete  set 
of  scheduling  points  indicates  that  the  weights  of  the  optimal  schedules  at  these  scheduling  points 
should  be  non-decreasing.  Furthermore,  a  complete  and  minimum  sei  of  scheduling  pcants  indicates 
that  the  weights  of  the  optimal  schedules  at  these  sched-wh’ng  point*  should  be  strictly  increasing. 
Hence,  we  can  merge  and  sort  the  two  arrays  of  and  A^.j,  and  remove  any  scheduling  point 
that  has  a  smaller  weight  than  that  of  its  preceding  scheduling  point  in  the  array.  The  resultant 
scheduling  points  is  thus  complete  and  minimum. 

3.7  The  Permutation  Schediuling  Algorithm  (PSA) 

Algorithm  PSA: 
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Input:  ^  permutation  sequence  iJ.  =  (ri,r2, . .  .,r„) 

Output:  an  optima]  schedule  a„(A„,„„) 

Initialization;  uq  =  1;  Aq.i  =  0;  oo(Ad,j)  =  ();  M^(oro(‘^o.i))  =  0 
for  k  =  I 
when 

case  1  {dk  -  Ck  <  Lk-i)  :  (m  is  not  feasible) 
exit 

case  2  (r*  <  Lk-i  <  dk  -  Ck)  or  (ijt-i  <  <  i^k-i)  : 

Computation  for  the  first  scheduling  point: 

Xk,i  =  Tnax{Xk-i,\  +  Ck,Tk  +  Ck) 

;■  =  1  if  Ajt_i,j  >  rk  \  otherwise,  j  is  the  greatest  integer  such  that  Xk-\j  <  rk 

ffk{Ak,i)  =  crk-i(^k-ij)  ® 

W(ak(Ak.i))  =  W(ak-,(A,_aj)) 

Loop:  j  =  Jmin  to  where  and  JmcT  denote  the  smallest  and  the  largest 

integers  of  j  satisfying  A^^  <  At_ij  +  Ck  <  dk- 

t  —  j  ~  Jmin  "h  2 

Ak,t  =  ^k-lj  -f  Ck 

<7k(Ak;.,)  =  crk-i(Ak_ij)  ©  Tk 
W{ak{Xk,i))  =  Wiak.i{Xk.ij)) 

Uk  —  Jma.z  ~  Jmin  ^  2 

case  3  (t/k-i  <  ^k)  ■  (oidy  one  scheduling  point  ) 

Ak.l  =  Tk  +  Ck 

c^k(Ak.i)  =  £yk-i(Ak-i.v*_,)©rk 
M^(«7k(Ak,r))  =  WK_3(Ak_2.v*_,)) 

Uk  =  1 


to  n 

Tk  is  critical 
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....  I3S3^ 

;i.v 

I*  Hence,  Xk  =  ^k-i  =  <^k-i  */ 

c^se2{rk<Lk-.<dk-Ck)oriLk-^<rk<Vk.^)-- 

Computation  for  the  first  new  possible  scheduling  pom  . 

X'  =  max{Xk-\,i  +  cjc, ’■*  +  v  t>,at  Xt  i  •  <  ^jc 

,•  =  1  K  X.-.,.  >  U',  otherwise,  i  is  the  greatest  integer  snch  that  Xs-r.,  . 

l)  “  (^k-\{^k-\j)  ®  “^k 

,y(4(Xl,,))  =  ,,,  ,,„^est  end  the  largest 

T  *  —  7  *  to  Jmax^  where  Jmin  » 

Loop:  J  —  Jimn  Jmax^  ^  .  r  •  ^  >1  •,  ,*  +  Cfc  ^ 

Integers  c^j  satisfying  X^  j  <  k-ij 

i  =  j  -  Jmin  +  2 

K,i  = 

c'k{X'k,i)  =  j)  ®  ’■* 

construct  a*  from  at-i  and  ff'k  by 

11  merging  and  sorting  Xfc_a  and  X',  into  one  array 

rig  the  weights  01  the  sehednles  in  the  resnltant  arra,  str,ct,>. 

’  increelne  removing  an,  schednle  oS  the  arra,  11  necessar,.. 

3  (Us-r  <  re)  :  (addihg  one  more  schednling  point) 

Vk  =  Vk-\  +  1 
Xjc,vi,  —  rj:  +  Cj: 

Ok{Xk,v,)  = 


79 


=  W{ck.-^{Xk.i,^,_,))+Wk 

/*  Note  that  Xkj  =  Xk-yj  and  ak{Xkj)  =  c;k-y{Xk-yj)  for  j  =  1  to  Vk-y  */ 

endfor 

4  Scheduling  a  Task  Set 

To  find  an  optimal  schedule  for  the  task  set,  we  may  have  to  consider  ah  possible  (n!)  permutations. 
It  is  possible  to  reduce  the  search  space  by  eliminating  some  infeasible  permutations.  For  example, 
if  d{  <  Tj,  there  is  no  feasible  schedule  in  which  t,-  is  placed  after  Tj.  Even  after  the  reduction,  the 
search  space  might  still  be  too  large.  We  propose  to  use  simulated  annealing  technique,  recognizing 
that  while  this  technique  reduces  the  search,  it  may  yield  sub-optimal  results. 

4.1  Simulated  Annealing 

Simulated  annealing  is  a  stochastic  approach  for  solving  large  optimization  problems.  It  was  de¬ 
veloped  using  statistical  mechanics  ideas  to  find  a  global  minimum  point  in  the  energy  space. 
Kirkpatrick  et  al  [5]  had  demonstrated  the  power  and  applications  of  simulated  annealing  to  the 
field  of  combinatorial  optimization. 

To  find  the  optimal  solution  of  the  optimization  problem  is  similar  to  finding  the  lowest  energ}' 
state  of  metal.  The  metal  is  melted  first.  Then  it  is  cooled  down  slowly  until  the  freezing  point 
is  reached.  At  each  temperature,  a  number  of  trials  are  carried  out  to  reach  the  equilibrium.  The 
temperature  has  to  be  controlled  not  to  drop  too  quick;  otherwise,  it  is  possible  to  be  trapped 
in  a  local  minimum  energj’  configuration.  Lower  energy  generally  indicates  a  better  solution. 
The  annealing  process  starts  from  a  randomly  chosen  configuration,  proceeding  to  seek  potentially 
promising  neighbor  configurations.  The  neighbor  configuration  is  derived  by  perturbing  the  current 
configuration.  If  the  neighbor  configuration  has  a  lower  energj’,  the  change  is  always  accepted.  The 
distinct  feature  is  that  the  neghbor  configuration  with  a  higher  energj’  can  also  be  accepted  with 
the  probability  of  where  T  is  the  temperature,  and  E—E'  represents  the  difference  in  the 

energy  of  current  and  neighbor  configurations.  Notice  that  when  the  temperature  is  high,  an  energy 
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up  jump  is  more  likely  than  it  is  when  the  temperature  is  low,  zis  it  may  reach  the  confipur^ticn, 
although  with  higher  energy,  which  may  lead  to  a  better  solution.  An  up  jump  means  a  jump  from 
low  energy  to  high  energy,  and  a  down  jump  means  a  jump  from  high  energy  to  low  energy. 

4.2  The  Set  Scheduling  Algorithm  (SSA) 

A  permutation  is  used  to  represent  the  configuration.  If  a  permutation  is  ordered  in  an  Earliest 
Deadline  First  (EDF)  fashion,  we  call  it  an  EDF  permutation.  An  EDF  permutation  may  be  a 
good  starting  permutation  for  the  proce&s  of  simulated  annealing  foi  this  problem.  If  the  window 
of  a  task  is  contained  in  the  window  of  another  task,  we  say  that  the  latter  task  contains  the  former 
task.  If  there  are  no  containing  relations  among  tasks,  the  EDF  permutation  is  a  peririU-tatlon  of 
which  an  optimal  schedule  of  the  task  set  is  a  subsequence  [4].  Thus,  an  optimal  schedule  for  the 
task  set  can  be  generated  by  PSA  by  schcdab’ng  the  EOF  permutation.  The  energy  function  can 
be  expressed  by  a  loss  function: 

loss  =  ^  weight  of  rejected  noncritical  tasks 

A  schedule  is  not  acceptable  if  critical  tasks  are  rejected.  We  may  say  that  the  loss  of  a  rejected 
critical  task  is  infinity.  However,  this  kind  of  assignment  makes  it  difficult  to  distinguish  between 
a  very  bad  schedule  (e.g.,  a  critical  task  is  rejected)  afvd  even  a  worse  schedule  (more  critical  tasks 
are  rejected).  In  general,  the  former  schedule  can  be  considered  as  an  improvement  over  the  latter 
one.  If  the  loss  incurred  by  a  rejected  critical  task  is  assigned  infinity,  there  is  no  way  to  tell  which 
is  better  between  the  schedule  in  which  one  critical  task 'is  rejected  and  that  in  which  three  critical 
tasks  are  rejected.  Hence,  we  assign  a  finite  amount  of  loss  to  rejected  critical  tasks.  The  loss 
of  a  critical  task  must  be  large,  enough  such  that  the  scheduler  w’ill  not  reject  a  critical  task  to 
accommodate  a  number  of  non-critical  tasks. 

The  neighbor  function  may  be  obtained  using  one  of  the  following  two  methods.  In  the  first, 
simple  method,  we  randomly  select  one  task  from  those  rejected.  This  task  is  inserted  in  a  randomly 
chosen  location  within  a  specified  distance  from  its  original  location,  where  the  distance  is  the 
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number  of  tasks  between  two  tasks  in  a  permutation.  The  distance  is  used  in  this  approach  to 
control  the  degree  of  perturbation. 

The  reason  of  rejecting  a  task  is  due  to  the  acceptance  of  other  tasks.  Given  a  schedule  for 
a  permutation,  it  is  somethnes  difficult  to  identify  which  task  results  in  the  rejection  of  other 
tasks,  especially  when  tasks  are  congested  together.  However,  the  task  immediately  before  or  after 
those  rejected  is  likely  to  play  a  role.  In  the  second  method,  we  try  to  identify  the  task  which 
causes  the  largest  loss  of  weight.  As  a  simple  approach,  we  attribute  the  rejection  of  a  task  to 
the  t2Lsk  accepted  prior  to  it.  Then  we  choose  the  task  which  causes  the  largest  loss  of  weight  and 
insert  it  within  a  specified  distance.  Due  to  the  robustness  of  simulated  annealing  technique,  the 
impact  of  not  necessarily  selecting  the  task  vviiich  caused  the  largest  loss  is  minimal.  Note  that  in 
simulated  annealing  many  parameters  are  randomized,  and  the  energy  function,  together  with  the 
temperature,  control  the  progress- of  the  annealing  process.  Tindell  et  al  [9]  commented  that  the 
great  beauty  of  the  simulated  annealing  lies  in  that  you  only  need  to  describe  what  constitutes  a 
good  solution  without  worrying  about  how  to  reach  it.  According  to  our  experiments,  we  find  that 
the  first  method  performs  better  than  the  second  method.  However,  the  process  in  the  first  method 
sometimes  falls  into  a  local  minimum.  The  combination  of  the  two  methods  does  perform  better 
than  any  of  the  individual  one.  The  Set  Scheduling  Algorithm  (SSA)  is  presented  in  Figure  7. 

The  initial  temperature  has  to  he  large  enough  such  that  virtually  all  up  jumps  are  allcv/ed  in 
the  beginning  of  the  annealing  process.  According  to  [9],  the  way  to  compute  new  temperature  is 
that  new  temperature  =  oc  »  current  temperature,  where  0  <  a  <  1.  A  step  denotes  an  iteration 
in  the  inner  loop  in  Figure  7,  whicli  is  the  process  of  scheduling  a  permutation  and  determining 
whether  the  permutation  would  become  the  current  permutation.  The  thermal  equilibrium  can  be 
reached  if  a  certain  number  of  down  jumps  or  a  certain  number  of  total  steps  has  been  observed; 
and  the  freezing  point,  or  the  stopping  condition,  can  be  reached  if  no  further  down  jump  has  been 
observed  in  a  certain  number  of  steps  {5,  9]. 
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Algorithm  SSA: 


Begin 

choose  initial  temperature  T 

clioose  edf  permutation  as  the  starting  permutaion,  fx 
schedule  ^  by  PSA  and  compute  its  energy,  E 
loop 

loop 

compute  neighbor  permutation  fi' 

schedule  /z'  by  PSA  and  compute  its  energy,  E’ 

E'  <  E  then 

making  fi'  the  current  permutation:  *—  Jmd  E  •—  E' 
else 

if  e~T~  >  random(0,l)  then 

making  n'  the  current  permutation:  /z  E  *-  E' 

else 

fi  remains  as  the  current  permutation 
until  thermal  eqmlibrium  is  reached 
compute  new  temperature:  T  —  a  *T 
until  stopping  condition  is  reached 

End 


Figure  7:  Set  Scheduling  Algorithm 
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5  Experiment  Result 


Experiments  sreccodudoi  tostady  the  performance  of  SSA  based  on: 

V  j  V  Wf-ir  _  nmmbcr  of  times  that  the  Aly^orilhin  pener&tes  a  feasible  schedule 

•  SCilCuUUlIg  a.DlU  y  number  cf  times  th&t  there  does  exist  &  fe&sible  scheduie  for  the  task  set 


1  ratin  —  of  ibe  schedule  RcnerAted  by  SSA  ~  loss  of  an  optima]  s<Aedu)e 
loss  ratio  —  tot  a)  weight  of  Accepted  noncriticAi  tasks  of  an  optimal  schedule 


•  iterations  =  number  of  permutations  that  the  simulated  annealing  algorithm  goes  through  to 
obtain  the  sub-optimal  schedule 

We  start  with  an  EDF  permutation.  To  study  how  good  the  result  would  be  by  using  PSA  to 
schedule  the  EDF  permutation,  the  scheduling  ability  and  loss  ratio  for  the  EDF  permutation  are 
computed  as  well.  In  our  experiments,  a  task  set  consists  of  100  tasks.  The  number  of  permutations 
in  such  a  task  set  is  100!  w  9.33  *  10^®^.  To  study  how  good  the  output  of  SSA  is  compared  to  an 
optimal  schedule,  it  is  rather  impractical  to  go  through  such  a  great  number  of  permutations  for  a 
task  set  to  derive  the  optimal  schedule  and  its  minimum  loss  for  comparison.  Instead,  we  choose 
to  make  up  a  task  set  such  that  the  task  set  is  feasible  and  the  loss  of  its  optimal  schedule  is  0. 
Although  the  SSA  algorithm  is  primarily  designed  for  an  overloaded  system,  we  apply  SSA  to  such 
task  sets  for  measuring  the  performance.  The  parameters  are  shown  in  Figure  8. 


parameters 

value 

type 

window  length 

mean.Wl  =  20.0 

truncated  normal  distribution 

computation  time 

rneanX  =  2i£2|^ 

truncated  normal  distribution 

’  load 

1 

20%,  40%,  60%,  80% 

constants 

criticality  ratio 

25%,  50%,  75% 

constants 

1  1  >.«.» Ml.  1 

weight 

W-W=il,  higji  W=50 

disaete  uniform  distribution 

Figure  8:  Parameters  of  the  experiments 


The  mean  of  window  length,  meam.Wl,  is  set  to  be  20  time  units.  The  load  is  the  ratio  of  total 
computation  time  to  the  largest  deadline,  D,  in  the  task  set.  Hence,  the  load  indicates  the  difficulty 
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of  scheduling  the  task  set.  The  mean  of  computation  lime,  mean.C,  is  one  third  of  the  mean  of 
window  length,  which  allows  the  windows  among  tasks  to  overlap  to  some  extent.  How  much  the 
windows  overlap  partially  depends  on  the  load.  If  the  load  is  high,  the  windows  are  congested 
together,  and  thus  the  overlapping  is  high.  We  expect  some  containing  relations  between  tasks 
to  occur  and  thus  increase  the  difficulty  for  scheduling.  Note  that,  without  containing  relations, 
scheduling  the  task  set  would  be  straightforward.  The  standard  deviations  of  window  length  and 
computation  time  are  set  to  be  their  means,  respectively.  Criticality  ratio  indicates  the  percentage 
of  the  critical  tasks  in  the  task  set.  It  is  set  to  be  25%,  50%,  and  75%.  The  higher  the  criticality 
ratio,  the  more  difficult  it  is  to  generate  a  feasible  schedule  for  the  task  set.  On  the  other  hand, 
although  it  is  easier  to  come  up  with  a  feasible  schedule  when  the  criticality  ratio  is  low,  the  loss 
ratio  may  still  be  high.  It  may  be  necessary  to  go  through  many  permutations  before  an  acceptable 
loss  ratio  is  reached.  In  our  experiments,  the  acceptable  loss  rstio  is  set  to  be  0%,  which  means 
that  SSA  wiU  keep  trying  different  permutations  until  either  the  loss  ratio  is  0  or  the  stopping 
conaition  is  reached,  in  which  SSA  fails  to  find  an  optimal  schedule.  Note  that  a  big  energy  (loss), 
1000,  is  incurred  for  a  rejected  critical  task.  Hence,  for  an  Infeasible  schedule,  the  loss  ratio  may 
well  be  more  than  100%.  The  weight  of  a  non-critical  task  is  an  integer  ranging  from  low_W=:l  to 
high.W=50,  determined  by  a  discrete  uniform  distribution  function.  For  each  individual  experiment 
w'ith  different  parameters,  200  task  sets,  each  with  100  tasks,  are  generated  for  scheduling.  The 
way  of  creating  a  feasible  task  set  without  loss  is  described  in  appendix  A. 

From  Figure  9a,  The  scheduling  ability  of  SSA  is  98.5%  when  criticality  ratio  is  75%  and  load 
is  80%,  and  is  100%  for  other  lower  criticality  ratios  and  loads.  This  is  because  the  simulated 
annealing  algorithm  focuses  on  searching  suitable  neighbor  permutations  in  such  a  way  that  the 
rejected  critical  tasks,  if  any,  may  be  accepted.  Note  that  scheduling  only  the  EDF  permutation 
can  not  always  generate  a  feasible  schedule.  The  scheduling  ability  of  scheduling  EDF  permutation 
degrades  when  load  increases,  which  means  taiks  congest  more  together.  The  scheduling  ability 
of  schedubng  EDF  permutation  also  degrades  when  the  criticality  ratio  increases,  which  makes 
meeting  the  deadlines  of  all  critical  tasks  become  more  difficult. 
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As  far  as  non-criticaJ  tasks  are  concerned,  SSA  can  not  guarantee  the  minimum  loss.  However, 
even  in  the  worst  case  given  in  Figure  9b,  the  loss  ratio  is  less  than  10%.  The  loss  ratio  becomes 
less  when  criticality  ratio  or  load  is  less.  In  many  cases,  the  loss  ratios  are  less  than  5%.  As  for 
scheduling  the  EDF  permutation,  the  loss  ratios  are  significantly  larger. 

The  number  of  permutations  to  be  searched  in  simulated  annealing  depends  on  the  situations 
of  energy  jumps,  the  way  of  reducing  temperature,  and  how  we  define  thermal  equilibrium  and 
stopping  conditions.  In  the  experiments,  we  find  that  reducing  temperature  faster  does  not  impose 
a  negative  impact  on  the  scheduling  ability  and  loss.  How  to  set  the  parameters  in  simulated 
annealing  differs  a  great  deal  from  one  application  to  another.  We  do  want  to  generate  the  result 
as  good  as  possible,  but  are  not  willing  to  spend  more  computation  time  than  necessary.  This 
usually  requires  fine  tuning  the  parameters  to  get  the  trade-off  between  the  two  goals.  We  find  that 
the  following  parameters  are  beneficial:  initial  temperature  =  3000,  a  =  0.8  (instead  of  0.95  or  even 
0.99  suggested  in  other  applications),  the  number  of  down  jumps  to  obtain  thermal  equilibrium  = 
25,  the  number  of  total  steps  to  obtain  thermal  equilibrium  =  300,  the  number  of  steps  with  no 
further  down  jump  to  obtain  the  freezing  point  =  2000,  which  is  also  the  stopping  condition.  The 
average  number  of  permutations  searched  in  simulates  annealing  is  given  in  Figure  9c.  If  SSA  can 
successfully  generate  a  feasible  schedule,  the  average  number  of  permutations  checked  is  no  more 
than  4000  times.  The  number  increases  a  little  if  SSA  fails  to  find  a  feasible  schedule,  because  in 
this  case  SSA  does  not  stop  until  the  freezing  point  is  reached.  Mote  that  the  average  numbers  of 
permutations  are  less  than  which  can  roughly  give  us  the  idea  about  the  complexity  of  searcning 
over  the  permutation  space.  Additional  studies  have  shown  that  if  we  modify  the  above  parameters 
to  increase  the  average  number  of  permutations  by  about  10  times,  the  loss  ratios  can  be  further 
reduced  by  about  25%  of  the  loss  ratios  obtained  here. 

If  time  can  be  expressed  in  integers,  the  dynamic  programming  technique  used  in  PSA  can  be 
applied  by  computing  cT;c(t)  at  t  =  1, . . D.  Let  us  call  this  approach  the  integral  PSA,  compared  to 
the  original  PSA  with  scheduling  points,  denoted  by  PSA  SP  in  Figures  9d.  Obviously,  the  integral 
PSA  tends  to  compute  more  schedules  than  the  original  PSA.  We  would  like  to  see  how  more 
efficient  the  original  PSA  algorithm  is  than  the  integral  PSA.  Specifically,  we  compare  the  average 
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number  of  schedules  required  to  derive  the  optimal  schedule  for  a  permutation.  For  the  integral 
PSA,  the  number  of  schedules  computed  is  fixed,  or  n*D,  as  can  be  seen  in  Figure  1.  For  the  original 
PSA,  vk  is  the  number  of  schedules  needed  to  schedule  a  permutation.  The  average  number 
of  schedules  needed  to  schedule  a  permutation  by  PSA  is  computed  over  the  permutations  of  a  task 
set,  and  is  presented  in  Figure  9d.  The  number  for  the  original  PSA  decreases  with  the  criticality 
ratio.  This  is  because  a  critical  task  never  increases  the  number  of  scheduling  points;  instead,  the 
number  of  scheduling  points  might  be  decreased  due  to  the  timing  constraint  of  the  critical  task. 
For  the  criticality  ratios  of  0.25, 0.50,  and  0.75,  the  average  number  of  schedules  required  for  a  task 
set  of  100  tasks  are  approximately  480,250,  and  150,  respectively.  The  complexity  of  the  original 
PSA  seems  linear  in  this  sense.  On  the  other  hand,  the  complexity  of  the  integral  PSA  is  quite 
high.  The  number  decreases  with  load.  This  happens  to  be  related  to  the  way  of  generating  the 
task  set,  in  which  D  =  total.c  /  load.  The  number  is  equal  to  n  *  I?,  where  D  might  fluctuate  a 

little. 

6  Conclusion 

In  this  paper,  we  study  the  scheduling  problem  for  a  real-time  system  which  is  overloaded.  A 
significant  performance  degradation  may  be  observed  in  the  sj’stem  if  the  overload  problem  is  not 
addressed  properly  [2].  As  not  all  the  tasks  can  be  processed,  the  set  of  tasks  selected  for  processing 
is  crucial  for  the  proper  operation  of  an  overloaded  system.  We  assign  to  the  tasks  cnticaliiies  and 
weights  on  the  basis  of  which  the  tasks  are  selected.  The  objective  is  to  generate  an  optimal 
schedule  for  the  task  set  such  that  all  of  the  critical  tasks  are  accepted,  and  then  the  loss  of  weights 
of  non-critical  tasks  is  minimum. 

We  present  a  two  step  process  for  generating  a  schedule.  First,  we  develop  a  schedule  for 
a  permutation  of  tasks  using  a  pseudo-polynomial  algorithm.  The  concept  of  scheduling  points 
is  proposed  for  the  algorithm.  In  order  to  find  the  optimal  schedule  for  the  task  set,  we  have  to 
consider  all  permutations.  The  simulated  annealing  technique  is  used  to  limit  the  search  space  while 
obtaining  optimal  or  near  optimal  results.  Our  experimental  results  indicate  that  the  approach  is 
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very  efficient. 

The  work  presented  in  this  paper  can  be  easily  extended  to  address  the  overload  issue  for 

\ 

periodic  tasks.  To  schedule  a  set  of  periodic  tasks  with  criticalities  and  weights,  we  can  convert 
the  periodic  tasks  in  the  time  frame  of  the  least  common  multiple  of  the  tzisk  periods  to  aperiodic 
tasks.  The  schedule  generated  for  the  frame  can  be  applied  repeatedly  for  the  subsequent  time 
frames. 

Our  algorithm  can  also  be  applied  to  solving  the  problem  of  scheduling  imprecise  computations 
[7],  in  which  a  task  is  decomposed  logically  into  a  mandatory  subtask,  which  must  finish  before 
the  deadline,  and  an  optional  subtask,  which  may  not  finish.  The  goal  is  to  find  a  schedule  such 
that  the  mandator}^  subtasks  can  all  be  finished  by  their  deadlines  and  the  sum  of  the  computation 
times  of  the  unfinished  optional  subtasks  is  minimum.  A  schedule  satisfies  the  0/1  consiraini  if 
every  optional  subtask  is  either  completed  or  discarded  (7).  We  can  solve  this  problem  by  using 
our  algorithm  by  setting  the  mandatory  subtasks  to  be  critical,  and  the  optional  subtasks  to  be 
non-critical  with  weights  equal  to  their  computation  times. 

Appendix  A.  Generating  a  task  set 

Generate  computation  times  for  tasks  according  to  mean^C  and  the  standard  deviation 

D  =  (total  computation  time)  /  load 

Assigning  starting  instants,  to  tasks  such  that 

the  intervals  between  the  computation  times  are  truncated  normally  distributed 
For  each  task  7k 

Determine  the  criticality  by  criticality  j-atio  and/or  weight  by  low.W  and  high.W 
Compute  the  window  length  of  Tk  according  to  meanJWl  and  the  standard  deviation 
(note  that  window  length  >  Cjt) 

align  the  window  with  the  computation  time  in  their  middle  points: 

Tk  =  max(0,S;t  + 

dk  =  mx7i(D,  Tk  +  window  Jength) 
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The  load  determnies  how  the  tasks  would  be  congested.  Once  the  largest  deadline,  D,  has  been 
computed,  we  separate  the  computation  tiroes  of  the  tasks  in  such  a  way  that  the  positions  of  the 
computation  times  on  the  time  axis  stretches  over  the  range  from  0  to  D.  Note  that  the  starting 
instants  of  the  computation  times  consist  in  an  optimal  schedule  for  the  task  set.  In  this  way,  all  of 
the  tasks  in  the  task  set  can  be  accepted.  At  last,  the  windows  are  aligned  with  the  computation 

times. 


References 

[1]  M.  R.  Garey  and  D.  S.  Johnson.  Computers  and  Intractability,  a  Guide  to  the  Theory  of 
NP~Completeness.  W.  H.  Freeman  Company,  San  Francisco,  1979. 

[2]  Jayant  R.  Haritsa,  Miron  Livny,  and  Michael  J.  Carey.  Earliest  deadline  scheduling  for  real-time 
database  systems.  In  IEEE  Real-Time  Systems  Symposium,  Dec.  1991. 

[3]  Shyh-ln  Hwang,  Sheng-Tzong  Cheng,  and  Ashok  K.  Agrawala.  An  optimal  solution  for  schedul¬ 
ing  real-time  tasks  wdth  rejection.  In  International  Computer  Symposium,  Dec.  1994. 

[4]  Shyh-ln  Hw’ang,  Sheng-Tzong  Cheng,  and  Ashok  K.  Agrawala.  Optimization  in  non-preemptive 
scheduling  for  aperiodic  tasks.  Technical  Report  CS-TR-3216,  UMlAuS-TR-94-14,  Department 
of  Computer  Science,  Dniversity  of  Maryland  at  College  Park,  Jan.  1994. 

{5]  S.  Kirkpatrick,  C.D.  Gelatt,  and  M.P.  Vecchi.  Optimization  by  simulated  annealing.  Sci- 
ence(220),  pages  671-680, 1983. 

[6]  C.  L.  Liu  and  J.  Layland.  Scheduling  algorithm  for  multiprogramming  in  a  hard  real-time 
environment.  Journal  of  the  ACM,  20(l)t46— 61,  Jan.  1973. 

[7]  W.K.  Shih,  J.  Liu,  and  J.Y.  Chung.  Fast  algorithms  for  scheduling  imprecise  computations.  In 
IEEE  Real-Time  Systems  Symposium,  pages  12-19,  Dec.  1989. 


90 


[8]  Philip  Thambidurai  and  Kishor  S.  Trivedi.  Transient  overloads  in  fault- tolerant  real-time  sys¬ 
tems.  In  IEEE  Real-Time  Systems  Symposium,  Dec.  1989. 

[9]  K.W.  Tindell,  A.  Burns,  and  A.J.  Wellings.  Allocating  hard  real-time  tasks:  An  np-hard 
problem  made  easy.  The  Journal  of  Real-Time  Systems,  4(2):145-165,  June  1992. 


91 


report  documentation  page 


f-orm  Appro^^d 
QMB  No  0704^0168 


i.- w-«. «-  s*<* '««  ‘"’/'  •T  V-  .r  iT-t^  ••  TC«^^.l^ri«i,t.w.»rvf'.Kr».  0..fnv..f  (C>  -..0-m,.^  Oorr.iK*..  ."<i  Ut>  .»n«.o» 

»’:  o**' •  «’  •^e  5-cor*..  ^»o»rct  PC  ^0^03  _ _ 

_  ■  . .  "  '  --  -  —  A  Kirv  rNATc<  rri\jCDm 


\  agency  use  only  (Lf^i^e  bUnK)  j  2.  REPORT  DATE 


November 


nTU  AND  SUBTITLE 


Scheduling  an  Overloaded  Real-Time  System 


3,  REPORT  TYPE  AND  DATES  COVERED 

_  Technical  Report  _ _ 

5.  FUNDING  NUMBERS 

N0001A-91-C-0195 


6.  AUTHOft(S) 

Shyh-ln  Hwang >  Chia— Mei  Chen  and  Ashok  K.  Agrawala 

T,  PERFORMING  ORGANIZATION  NAME(S)  AND  AODRESS(ES) 

University  of  Maryland 
Department  of  Computer  Science 
A.  V.  Willliams  Building 
College  Park,  MD  207A2 

9.  SPONSORING /MONITORING  AGENCY  NAM£(S)  AND  ADDRtSS(ES) 

Honeywell,  Inc.  Phillips  Laboratory 

3600  Technology  Drive  Directorate  of  Contracting 

Minneapolis,  m  55A18  3651  Lowry  Avenue  SE 

Kirtland  APB  NM  87117-5777 


DASG-60-92-C-0055 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 

CS-TR-  3377 
UMIACS-TR-94-128 


TO  SPONSORING /MONITORING 

agency  report  number 


11.  SUPPLEMENTARY  NOTES 


IZa.  DISTRIBUTION /AVAILABILITY  STATEMENT 


12b.  DISTRIBUTION  CODE 


: I,  ABSTRACT  (Maximum  7:^0  ^  zrzi) 

Tne  real-tioe  systems  differ  from  the  conventional  systems  in  that  every  task  in 
the  real-time  system  has  a  timng  constraint.  Failure  to  execute  the  tasks 
timing  constraints  may  result  in  fatal  errors.  Sometimes,  it  may  be  impossible  to 
execute  all  the  tasks  in  the  task  set  under  their  timing  constraints.  Considering 
svstem  vith  limited  resources,  one  solution  to  handle  the  overload  problem  is  to 
reject  some  of  the  tasks  in  order  to  generate  a  feasible  schedule  for  the  rest,  in 
this  paper,  we  consider  the  problem  of  scheduling  a  set  of  tasks  without  preemption  in 
i-'hich  each  task  is  assigned  criticality  and  weight.  The  goal  is  to  generate  an  opt^ 
schedule  such  that  all  of  the  critical  tasks  are  scheduled  and  then  the  non  critical 
•asks  are  included  so  that  the  weight  of  rejected  non-critical  tasks  is  minimized.  ^ 
fwe  consider  the  problem  of  finding  the  optimal  schedule  in  two  steps.  First,  we  selec. 
a  permutation  sequence  of  the  task  set.  Secondly,  a  pseudo-polynomial  algorithm  is 
proposed  to  generate  an  optimal  schedule  for  the  permutation  sequence.  global 

ontimal  is  desired,  all  permutation  sequences  have  to  be  considered.  Instead,  we 
nropose  to  incorporate  the  simulated  annealing  technique  to  deal  with  the  large  searcti 
r  _ i _ __n _ ..I.....  ft....  a  1  oft.-T fi.— *  is  able _ to _ generate 


I  U.  SUBJECT  TERMS 


Process  Management;  Nonnumerical  Algorithms  and  Problems 


15.  NUMBER  OF  PAGES 

29 _ _ 

16.  PRICE  CODE 


17.  SECURITY  CLAS'SIFICATION  I  18. 'SECURITY  CLASSIFICATION  I  19.  SECURITY  CLASSIFICATION"  ZO.  Ll^^^ 

OF  report  of  this  page  OF  ABSTRACT 

Unclassified  Unclassified  Unclassified  Unlimited 


.'tSN  7S1P-01  230-5500 


92 


while  considering 


Notes  on  Symbol  Dynamics*^ 


Ashok  K.  Agrawala 

Department  of  Computer  Science,  University  of  Maryland 
College  Park,  Maryland  20742 

E-mail:  agrawala^cs.'uind.edu 
Christopher  Landauer 

System  PlanniBg  »nd  Devejopment  Division,  The  Aerospace  Corporation 
The  Hallmark  Building,  Suite  1S7,  13873  Park  Center  Road,  Herndon,  Virginia  22071 
Phone:  (703)  318-1666,  FAX:  (703)  318-5409 

E-mail:  caJ@aero.org 
13  February  1995 

Abstract 

This  paper  introduces  a  new  formulation  of  dynamic  systems  that  subsumes  both  the  classical  discrete  and  differential 
equation  models  as  well  as  current  trends  in  hybrid  models.  The  key  idea  is  to  express  the  system  dynamics  using 
symbols  to  which  the  notion  of  time  is  explicitly  attached.  The  state  of  the  system  is  described  using  symbols  which 
are  active  for  a  defined  period  of  time.  The  system  dynamics  is  then  represented  as  relations  between  the  symbolic 
representations. 

We  describe  the  notation  and  give  several  examples  of  its  use. 
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1  Introduction 


Tradiiionally,  systems  have  been  modelled  using  state  variables  defined  in  a  metric  space  and  the  system  dynamics 
defined  using  differential  equations.  This  approach  uses  continuous  descriptions  of  space  and  time.  "W^en  we  use 
computers  for  expressing  and  manipulating  such  models  we  have  to  use  symbols  to  represent  it.  Symbols  are  discrete 
by  their  very  nature,  and  require  use  of  mapping  from  the  continuous  spaces  to  discrete  spaces.  These  mappings 
cause  problems  unless  carried  out  rather  carefully.  Further,  when  we  consider  the  problems  in  which  some  aspects 
of  the  system  are  genuinely  discrete,  hybrid  models  have  been  used.  As  different  techniques  have  to  be  used  for 
continuous  and  discrete  aspects  of  the  system,  significant  complexity  gets  added  to  such  models. 

Recognizing  that  the  computer  systems  only  use  symbols  for  any  representations,  in  this  paper  we  present  a  for¬ 
mulation  of  system  dynamics  directly  in  terms  of  symbols.  In  order  to  handle  the  synamics,  time  interval  over 
which  a  symbol  is  considered  valid  is  explicitly  attached.  The  symbols  describing  different  aspects  of  the  system 
may  be  from  a  set  appropriate  for  that  aspect.  The  dynamics  is  described  in  terms  of  rules  connecting  the  symbolic 
representations. 

This  paper  contains  the  preliminary  formulation  of  system  dynamics  in  the  framework  of  Symbol  Dynamics. 

2  Descriptions  of  System  Behavior 

For  the  purposes  of  this  paper,  behavior  includes  all  the  relationships  among  parts  of  a  system  at  the  same  or  different 
times.  In  particular,  the  combined  relationships  among  parts  of  a  system  at  the  same  time  is  usually  called  structure. 
Both  of  these  aspects  are  subsumed  in  our  use  of  the  term  behavior. 

We  assume  that  our  ability  to  generate  or  derive  new  information  about  the  system  behavior  changes  only  at  discrete 
points  in  time,  since  we  expect  to  perform  these  processes  on  digital  computers.  The  event  times  define  the  time 
scale.  In  this  paper,  we  introduce  Symbol  Dynamics^  a  totally  symbolic  way  to  represent  the  important  aspects  of 
dynamical  systems  and  processes,  so  that  we  can  reason  about  them  using  computers. 

3  Concepts  and  Notations 

This  section  contains  the  basic  notions  of  Symbol  Dynamics. 

3.1  State  Variable 

We  assume  that  systems  exist  and  change  over  time.  We  are  looking  for  a  method  of  describing  those  changes  so  we 
can  compute  how  to  control  them. 

The  systems  we  consider  can  be  described  with  state  variables.  Each  state  variable  is  an  observation  on  the  system 
or  a  derivation  from  other  state  variables. 

We  may  or  may  not  know  a  priori  which  state  variables  are  important,  or  even  which  ones  are  determinable  (i.e.,  the 
system  comes  first,  and  the  state  variables  are  chosen  to  be  helpful  in  describing  the  behavior).  We  might  call  the 
state  variables  attributes  of  the  state. 

3.2  Symbol 

AVe  want  to  measure  and  compute  with  information  about  a  system,  so  we  need  to  map  the  system  into  formal  spaces 
we  understand  better. 

A  type  is  a  s}Tnbol  set,  both  representing  a  set  of  values  and  including  some  operations  on  those  values;  this  is  the 
notion  of  formal  space  used  here.  It  includes  collections  of  mutually  dependent  types  and  functions  between  different 
types. 

A  symbol  of  a  given  ty7>e  is  an  element  of  the  set  of  values  that  type.  Any  notions  of  credibility,  confidence,  or 
uncertainty  are  part  of  the  t^^pe  system  that  is  used.  It  is  especially  important  to  define  the  allowable  operations  on 
these  kinds  of  types.  For  example,  for  measurements  of  a  system,  the  symbol  would  include  the  measured  value  and 
the  associated  uncertainty  value. 

3.3  Attribute  Identifier 

We  assume  that  we  will  want  to  know  different  things  about  the  system  behavior.  We  need  names  to  keep  track  of 
the  different  things  we  measure  or  compute. 
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An  attribute  identifier  is  a  name  for  a  state  variable  (a  state  variable  is  like  a  probe  into  some  aspect  of  the  system 
behavior,  and  the  attribute  identifier  is  only  the  label). 

3.4  Expression 

An  expression  is  a  pair 

(attribute  identifier:  s}Tnbol), 

which  is  interpreted  to  mean  the  assertion  that  the  state  variable  can  be  described  by  the  symbol  (when  the  expression 
is  active).  We  will  describe  the  precise  semantics  of  these  expressions  later  on. 

These  are  models  of  the  state  variable  values. 

3.5  Interval 

An  interval  is  a  pair 

jstan  time,  end  time), 

assumed  to  describe  a  half-open  interval  (to  save  us  from  trouble  with  the  topolog}^).  The  end  time  may  be  omitted, 
in  which  case  it  is  interpreted  to  mean  infinity  by  default. 

3.6  Characterizer 

A  characterizer  is  a  pair 

(expression,  interval), 
also  written 

(attribute  identifier:  symbol;  start  time,  end  time), 

interpreted  to  mean  that  the  expression  is  active  dxiring  the  specified  interval.  It  becomes  active  at  the  start  time, 
and  becomes  inactive  at  the  end  time.  Each  characterizer  has  a  range  (its  interval  of  activity)  and  a  scope  (the  set 
of  attribute  identifiers  that  occur  in  its  expression). 

We  may  also  consider  a  symbol  set  that  includes  arithmetic  expressions  that  contain  an  explicit  time  variable  i.  For 
example, 

(p  :po-rUo  *  t;to,ti) 

represents  a  continuous  change  along  the  interv^al. 

We  will  also  have  occasion  to  reason  about  conditions  at  particular  points  in  time,  so  the  assertion  language  will  also 
ha^»€  characterizers  of  the  form 
(expression,  point). 

3.7  Event 

An  event  is  the  activation  or  deactivation  of  a  characterizer.  We  make  no  limiting  assumptions  about  simiiltaneous 
events. 

4  System  Description 

A  system  description  is  a  finite  set  of  characterizers,  so  we  assume  explicitly  that  a  system  can  be  described  by  a 
finite  set  of  characterizers.  We  insist  that  only  a  finite  set  of  characterizers  be  active  at  any  one  time.  Since  each  of 
those  characterizers  is  active  over  a  positive  interval,  there  is  therefore  some  small  mtervaJ  thereafter  during  which 
all  of  them  are  still  active. 

Everything  we  know  about  a  system’s  behavior  is  described  by  diaracterizers  and  relationships  among  the  charac¬ 
terizers.  Domain  models  and  context  can  be  written  as  characterizers,  generally  with  large  intervals. 

4.1  Dynamics 

R.elationships  among  characterizers  are  rules  that  define  the  dynamics.  These  rules  take  the  form: 

if  these  characterizers  (with  a  list)  are  active  on  these  intervals,  then  this  new'  one  is  also  active  on  this 
other  interval  (not  necessarily  contained  in  the  intersection  of  the  original  intervals). 

96 


Rules  can  contain  variable  identifiers,  v^nth  implicit  universal  quantification. 

Relationships  hold  on  intervals  and  the  combination  may  extend  the  range.  We  generate  new  characterizers  according 
to  the  relationships,  either  predictive  (range  extension)  or  deductive  (knowledge  extension). 

The  language  in  which  the  rules  are  written  is  important,  since  it  has  to  accommodate  notations  from  many  different 
types,  many  of  which  will  not  be  known  when  the  language  is  defined.  Some  basic  concepts  that  will  be  in  any  of 
these  languages  are  continuity  and  derivatives. 

It  is  important  to  remember  that  the  system  comes  first,  and  that  the  state  variables  are  our  choices  for  modeling 
and  understanding  the  system.  This  means  in  particular  that  the  coordinate  systems  we  use  are  temporary,  and  that 
the  constraints  among  the  state  variables  are  expressed  explicitly  as  relationships. 


4.2  Normalization  and  Continuation 


Characterizers- may  have  overlapping  intervals.  No7malizaiion\s  the  process  of  breaking  each  characterizer  into  two 
or  more  others,  to  fit  the  time  scale.  If  t  is  an  event  time,  and 
(a  :  v;  s,  e) 

is  a  characterizer  with  s  <  t  <  e,  then  we  can  replace  it  with  two  characterizers 
(a  :  and  (a  :  u;f,e). 

If  tw»o  characterizers  use  the  same  attribute, 

(a  :  v\s,t) 
and 


(q:u;;£,u), 

then  we  say  that  the  second  one  coniinuts  the  first  one  iff  they  are  adjacent  in  time,  so  t  =  e.  Continuity  considerations 
in  the  transition  from  v  to  tt;  at  time  t  are  treated  in  the  next  section. 

In  any  system  with  a  finite  density  of  event  times,  if  we  split  every  characterizer  that  spans  an  event  time,  then  we 
end  up  with  characterizers  that  start  and  stop  at  consecutive  event  times  (though  they  may  be  coniinucd  by  other 
characterizers).  This  has  some  computational  conveniences. 

If  we  have  two  characterizers 
(q  :  v\iiyi2) 
and 


(a :  u;;f2,t3),  _  .  ..  . 

so  that  the  second  one  continues  the  first,  then  we  need  some  kind  of  explicit  characterizer  for  the  transition,  active 
in  an  interval  containing  the  transition  time.  If  there  is  a  description  u  in  an  appropriate-  domain  for  which 

_  J  r,  for  U  <t<  t7> 

^  ^  w,  for  t2  <t  <  ts, 


then  we  can  conclude 
(a  :  ■u;ti,l3). 

This  is  the  opposite  of  normalization. 

If  there  is  an  overlap,  that  is,  if  the  two  characterizers 
(c  :  f;ti,t2) 


and 


(c  :  u:;t3,f<) 

have 


\tut2)^\hyU)  non-empty, 

and 

v(0  =  u;(l)  forl€  Imax(ti,l3),niin(l2,  t^)), 
then  w»e  can  also  conclude 

(q  :  u;Tnin(t:,l2)>niax(i3.t4)). 


4.3  Continuation  and  Continuity-^ 

One  aspect  of  continuity  is  transitions  from  one  symbol  to  another  across  interval  boundaries.  The  transition 
relations  are  extra  conditions  that  have  to  hold  at  the  transition  time  (usually  they  are  smoothness  conditions  for 
model  transitions). 

A  typical  smoothness  property  is  infinitesimal:  for  characterizers 
(Q:u;to,ti) 
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and 


(a  :  w,tut2), 

we  normally  want  smoothness,  written 
dr  d  w 

^  ^  i-j-  ^  ^  t=t*  ' 

and  continuity,  written 

v(t  =  ~  ^]^)* 

Both  of  these  are  point  conditions  on  the  attributes  and  their  derivatives,  and  we  can  consider  only  conditions  on 
attributes  by  using  whatever  derivatives  are  needed  in  the  conditions:  instead  of 
(a:r;to,ti), 
we  use 
(c  : 

and  write  our  smoothness  condition  as 


If  we  also  require  continuity  in  each  attribute,  so  that 
■w{i  =  i*)  =  wit  =  tj), 

then  the  upper  limit  in  the  previous  expression  can  be  omitted, 
li  is  therefore  clear  that  we  must  deal  with  point  events  at  transitions 

but  not  with  point  diaracterizers.  If  we  make  the  transition  continuity  a  property  of  the  definition  of  continuation, 
then  we  can  assert  it  or  not  in  any  given  model. 

Of  course,  the  expression  i  —  iT  means  that  the  interval  [ty  —  e,  tj)  is  part  of  the  limit  computation  for  every  e  small 
enough,  so  we  might  be  able  to  use  these  intervals  for  some  small  enough  c  without  having  to  take  the  limits. 

We  will  deal  with  these  considerations  in  the  simplest  way  possible.  We  have  a  characterizer  that  asserts  continuity  of 
the  relevant  attribute  across  a  larger  interval,  such  as  jlo,  ^2)  above.  The  only  place  that  the  continuity  characterizer 
has  new  information  is  at  the  transition  point  Ij,  but  we  simply  do  not  worry  about  the  redundancy. 

4.4  Characterizer  Semantics  and  Inference 

A  characterizer  is  what  we  want  to  assume  about  what  is  true  over  its  interval.  It  need  not  be  consistent  with 
the  other  characterizers  in  a  system  description;  we  explicitly  allow  false  assertions  here,  so  we  can  reason  using 
counterfactuals. 

4.4.1  Inference 

We  can  make  inferences  uithin  interv'als,  according  to  some  rules.  If,  say,  there  is  a  rule 

=i>  S3, 

and  two  characterizers 

{v  : 

and 

with  to  <  <  fs.  t.hen  we  can  conclude 

{v  :  S3;t2,ti). 

4.4.2  Prediction 

Vi'e  can  also  make  inferences  that  extend  intervals  in  some  cases.  They  take  the  form:  If 
{v  :  si;to,t:) 
and 

(U/  .  Sj,  to  5  1 ) 

are  characterizers  with  to  <  ti,  then  there  is  a  characterizer 
(r  :  S3;t2,  ts) 

for  some  t2,t3,  with  to  <  t2  <  <  ^3- 
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4.4.3  Truth  Maintenance 

Because  we  do  not  presume  that  the  characterisers  in  a  system  axe  truths,  we  need  to  be  mu*  “J®|j2e^uks 
when  thev  can  be  used  together,  especially  in  the  inference  and  prediction  processes.  Since 
themselves  are  time  dependent,  we  need  to  keep  track  of  the 

it  was  derived  (how  tells  us  about  hypotheses  and  inference  rules;  when  helps  us  in  checking  temp  > 

We  eto  “iTa  di»u  which  chaaacleriaeca  w.  DO  wah.  to  be  tn..,  so  that  difteteht 

terizers  can  be  compared  and  contrasted  within  the  same  wntext.  We  might  want  to  consider  computing  vanous 

maximal  consistent  sets  of  irredundant  assertions  as  an  aid  in  this  process.  rtIH  ones-  we  also 

Various  rules  can  be  activated  that  lead  to  new  conclusions  in  an  interval,  which  can  supersede  old  ones,  we  also 
assume  partial  deduction,  not  total.  We  therefore  need  to  use  some  kind  of  non-monotonic  logic. 


4.5  Analysis 

wSt°tools  SS’^^^ic'po^  to  help  reduce  our  reliance  on  simulation,  so  we  can  make  reliable  predictions 

tZ  om  cCutatk^s  aJe  performed  from  the  symbols  active  at  a  given  tim.  ^he  adv^tage 

with  time  in  this  formulation  is  that  we  can  sit  outside  the  usual  sequencing  of  events,  vJhether 

look  at  the  entire  time  line,  and  piece  together  parts  of  the  models  that  we  know  more  about  regardless  of  vhetner 

or  not  they  are  the  first  ones  in  our  time  interval  of  interest.  ..cintr  smv  of  a 

We  can  also  perform  the  deductions  in  an  order  that  is  different  from  ti^^der  imposed  timing  any 
number  of  simple  mechanisms,  such  as  rule-based  systems  or  rewrite  logics;  both  are  being  investig 


5  Examples 

This  section  contains  several  examples  that  illustrate  the  utility  of  title  riotat4.on. 


5.1  ODE 

A  simple  example  that  shows  range  extension  is  an  ordinary  differential  equation  (ODE).  For  ODEs,  the  solution 
method  is  part  of  changing  an  ODE  into  a  set  of  characterizers. 

So  let  us  consider  a  simple  second-order  ODE  for  the  sine  function, 

y"  =  -y, 
y'(0)  =  1. 

and  solvent  with  Euler’s  method  (a  particularly  bad  one  for  this  kind  of  P**^'^* 

First,  we  transform  the  equations  into  a  first  order  system  (in  Ihe  usual  Vy -taong  xs»  y  , 
z'  =  -y, 
y'  =  =. 
r(0)  =  1, 

y(0)  =  0. 

and  we  also  define  z  =  z  =3/  . 


5.1.1  First-Order 

Now  the  way  Euler’s  method  works  is  by  linear  extrapolation,  so  for  a  given  time  i  =  to,  if  we  have 
=(to)  =  2^0, 

y(to)  =  yo> 

then  we  have 

ZQ  =  z(to)  =  —3/0, 
and  we  take 

x{t)  =  So  *r  ^  *  (^  ~  ^0)1 
y(t)  =  yo-r  2=0  •  (t  -  to), 
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for  t  in  some  small  interx'al 

[tOi  =  ^0  +  ^)' 

The  characterizers  that  describe  this  situation  are: 

(z  :  20  +  ^0  (i  ^o);  io»  ^0  +  dt), 

(y  :  3/0  -f  3Co  (t  —  to);io»  k)  +  dt), 

which  we  want  to  be  true  for  all  choices  of  20,3/0,  ^o.  dt  (which  ones  we  actually  use  in  our  system  description 
depend  on  how  we  choose  the  time  interv^als  in  the  solution). 

The  characterizers  that  describe  the  initial  conditions  are  difficult,  because  they  cannot  be  described  with  half-open 
intervals  of  the  shape  we  have  thus  far  described: 

(2:l;0), 

(y:0;0). 

which  is  alwa^'s  going  to  be  a  problem  in  systems  that  start  at  a  certain  time. 

In  a  more  sophisticated  system,  the  choice  of  next  time  interval  would  depend  on  the  computed  accuracy  of  the 
current  solution. 

For  this  example,  we  simply  make  all  the  time  intervals  the  same,  and  say  that  the  characterizer  pair 
(2  :  2i  -f  21  *  (f  -  ii)\  tu  h  -r  di), 

(y  :  yi  -f2i  *  (t  -  -h  dt) 

propagates  the  pair 

(2  :  20  -f  20  *  (f  to,  to  +  dt), 

(3;  •  VO  -r  20  »  (t  “  to);  to,  to  -f-  dt) 

iff 

23  =  xo-rzo^dt, 

yi  ==  2/0  -r  20  *=  dt, 
t}  ==  to  "h  dt, 

which  are  the  conditions  for  the  first  pair  to  meet  the  second  (the  condition  23  =  — yi  is  part  of  the  definition  of 
these  characterizer  pairs). 

Extending  the  iteration,  we  have 
2(0)  =  1, 

y(0)  =  0: 

2(k-fl)  =  2(k)  -  y(/:)  »  dt, 

y(^-ri)  =  y(^) -f  2(k)  »  dt, 

which  can  be  wirten  as  a  vector  eouation  (we  put  the  matrix  on  the  right  so  we  can  use  row  vectors) 

(2,y)(0)  =  (1,0), 


{z,yKk^'s)  =  iz,y){k)  (  f 

SO  if  we  write  I  for  the  identity  matrix  and  J  for  the  matrix 


then  we  have  (with  X  =  (x,v)) 

X(0)  =  (1,0), 

Xik-i-l)  =  X{k)iJ^J^di), 

SO 

;!'(fc)=(l,0) 

w»hich  can  be  computed  exactly. 

Since  the  eigenvalues  of  (i  -r  J  »  dt)  are  1  =  i  *  dt,  which  have  magnitude  1  -f  dt^,  the  successive  powers  of  the  matrix 
diverge  for  any  dt  >  0,  and  therefore  so  does  the  iteration. 


5.1.2  Second-Order  Example 

In  this  section,  we  use  the  same  differential  equation  problem,  with  a  different  solver,  a  second-order  one  that  is 
almost  able  to  converge  properly.  We  therefore  have 
2:'  =  -V, 
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y’  = 

=(0)  =  1, 

y(0)  =  0, 

as  above.  Our  initial  conditions  are 
(x:l;0). 

(y:0;0), 

as  before. 

The  method  we  use  is  a  simplified  second-order  Runge-Kutta  method  [?],  [?],  which  basically  amounts  to  averaging 
the  usual  Euler  approximation  in  an  interval  with  a  linear  reapproximation  at  the  endpoint  of  the  inter\'al.  At  a 
given  time  i  =  to,  if  we  have 
z(lo)  =  lo, 

y(to)  ■=  Vo. 
then  we  have 

x(t)  =  xo  -yo*dt-  xo*  dt^/2, 
y(i)  =  yo -T  xo  *  dl  -  yo  ». 

and  it  is  the  extra  dt^  terms  that  make  the  method  second-order. 

As  above,  we  assume  equal  time  intervals  and  get  an  iteration 

x(0)  =  1. 

V(0)  =  0, 

x{k^\)  =  x{k)-y{k)*dt-x{k)*dL^/2, 

y{k-Tl)  =  y{k)  +  x{k)  •  dt  -  y{k)  *  dt^ /2, 

which  can  be  written  as  a  vector  equation 

(x.y)(0)  =  (1.0), 

-  (X,V)W 

and  we  have  as  above 

A'(O)  =  (1,0), 

X{k-rl)  =  X{k){I^il-d£^/2)  +  J^dt), 
so 

Xik)  =  (i,  0)  (J  *  (1  -  it^/2)  +  J  *  dt)^, 
which  can  be  computed  exactly. 

Since  the  eigen%'alues  of  (J »  (1  -  dt^/2) -!-  J  «  di)  are  1  —  dt^/2 ±i*dl,  which  have  magnitude  1  -i-  di*/A,  this  simple 
method  still  does  not  converge  (but  much  more  slowdy). 


5.1.3  Higher-Order  Example 

A  similar  analysis  of  the  usual  4th-order  Runge-Kutta  method  leads  to  an  iteration 
x{t)  s=  xo-yo*  dt-xo‘  dr^/2  -r  yo  »  dt^/6  -i-  lo  »  df^/24, 

y(i)  =  yo-rZo«dt-3/o*  di^/2  -  lo  •  ct^/6  -r  3/0  »  di*  124, 
with  matrix 

[  l-<i:V2-rdrV24  dt-dl^/6  N 

-dt-~dt^/6  l-(t‘^l2  +  diy2A  )' 

^d  eigem’alue  magmtude  of  1  -f  <if®/36  -f  df®/24^,  which  is  still  greater  than  one.  In  fact,  since  this  equation  (in 
(x.y)  space)  represents  moving  arotind  a  circle,  any  extrapolation  method  based  on  tangents  at  a  single  point  will 
fail,  since  all  of  the  tangent  vectors  point  outward  from  the  circle.  We  note  that  the  iteration  equations  do  have  the 
first  terms  of  the  usual  Maclaurin  series  for  sin(dt)  and  cos(<it),  so  we  try  out  a  different  iteration: 
z(t)  =  zo  »  cos{di)  -  yo  »  sin[di), 

y(^)  *=  yo  *  cos{di)  -f-  To  *  sin(dt), 
which  can  be  written  as  a  vector  equation 
(x,y)(0)  =  (1,0), 
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co5(cii)  5m(£il)  ^ 
— 5in(cii)  co${di)  J  ’ 


(r.y)(A-+l)  =  ix,y){k)  ^ 

and  we  have  as  above 

A'(0)  =  (1.0), 

^■(^c  +  l)  =  A’(i)(/ »cos((it)  + J  *  sm(<i)), 

SO 

X{k)  =  (],0)  (7  .  cos(di)  +  J  »  sm(dt))^, 

=  (1 , 0)  ( J  *  C05():  ^  dl)-^  J  *  sin{k  *  cil)), 

and 

x{k  *  dl)  =  cos{k  *  dl)y 
y{k  *  dt)  =  sin{k  *  dt), 

from  which  we  can  hazard  a  gijess  as  to  the  correct  solution. 


5.2  Measurement 


Let  us  take  a  simple  ^stem  in  which  the  velocity  and  position  are  occasionally  known  through  inexact  measurement. 
Our  state  variables  are  p  for  the  position,  v  for  the  velocity,  and  a  for  the  unknown  acceleration. 

^^'e  assume  that  the  acceleration  a  is  bounded  by  some  constant  so  that  for  any  times  to  <  ii 

Iv(ti)  -  u(to)l  <  |ti— tol»A. 

We  assume  that  we  have  characterizers 

(a(t);  t,-i,  ti) 

that  describe  the  acceleration,  and  model  characterizers 

Therefore,  we  can  compute  the  velocity  and  position  by 
v(l)  =  v(to)  -f  / 

p(l)=p(to)  -r  f  v{u)du. 

Ju<^<i  •  V  f 

The  problem  is  to  choose  measurement  times  and  variables  that  maintain  a  certain  accuracy  in  the  estimates  o 


position. 

We  assume  that  we  can  measure  position  within  a  bound 

Lx>meas{i)  -  ^  ^ 

and  that  we  can  measure  velocity  within  a  bound 

but  that  we  want  to  keep  our  estimate  Pest  position  either  more  accurately  tnan  the  posuion 
(this  might  or  might  not  be  possible)  or  using  as  few  measurements  as  possible. 

We  assume  first  that  are  known,  and  consider  an  interval  {to,fi)-  compute 


measurement  error 


ir(ti)-uol  <  lfi“iol*A, 
and  therefore 

|x(li)-aol  <  ^  »  |ia  -  » -A, 

so  we  would  have  to  choose 


1  =  t,  -  to  • 

so  that 

At  <  \VIA\ 

to  keep  the  velocity  within  bounds,  and 
<  |2»PM1 

to  keep  the  position  within  bounds.  _  j  ■  • 

But  of  course,  we  don’t  luaow  z(t)  or  v(t)  after  the  first  time  interval,  so  we  need  to  change  the  previous  derivation 

a  bit. 

We  assume  that  we  know  zo  and  vq,  and  that 
|x(lo)  -  lol  <  ^  =0 
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describes  the  accuracy  of  our  knowledge  of  i(£)  at  time  t  =  to,  and 
|v(to)  —  wol  <  A  1(0 

describes  the  accuracy  of  our  knowledge  of  v{t)  at  time  t  =  to-  Then  the  above  inequalities  become 
|u(£j)  —  uol  <  A  Wo  +  |ti  —  tol  »  A, 
and  therefore 

N(^i)  “  ^ol  ^  ^  ^0  +  |ti  -*  tol  »  A  t;o  -f  —  »  |ti  ~  tol^  *  A, 
so  we  have  to  have 

At  <  \{V-Avo)/A\ 
to  keep  the  velocity  >\ithin  bounds,  and 

(A  tf  +  *  (A  i)  <  12*(P-Aso)M1 

to  keep  the  position  within  bounds. 

At  this  point,  we  are  stude  unless  we  can  say  something  more  helpful  about  the  acceleration.  Suppose  we  know  that 
the  acceleration  jumps  around,  and  that  it  has  a  distribution  of  values  with  mean  0  and  variance  R.  In  this  case,  we 
might  be  able  to  reduce  the  estimates  for  position  and  velocity  and  improve  the  time  intervals. 
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Abstract 

Allocation  problem  has  always  been  one  of  the  fundamental  issues  of  building  the  applica¬ 
tions  in  distributed  computing  systems  (DCS).  For  real-time  applications  on  DCS,  the  allocation 
problem  should  directly  address  the  issues  of  task  and  communication  scheduling.  In  this  con¬ 
text,  the  allocation  of  tasks  has  to  fully  utilize  the  available  processors  and  the  scheduling 
of  tasks  has  to  meet  the  specified  timing  constraints.  Clearly,  the  execution  of  tasks  under 
the  allocation  and  schedule  has  to  satisfy  the  precedence,  resources,  and  other  synchronization 
constraints  among  them. 

Recently,  the  timing  requirements  of  the  real-time  systems  emerge  that  the  relative  timing 
constraints  are  imposed  on  the  consecutive  executions  of  each  task  and  the  inter-task  temporal 
relationships  are  specified  across  task  periods.  In  this  paper  we  consider  the  adlocation  and 
scheduling  problem  of  the  periodic  tasks  with  such  timing  requirements.  Given  a  set  of  periodic 
tasks,  we  consider  the  least  common  multiple  (LCM)  of  the  task  periods.  Each  task  is  extended 
to  several  instances  within  the  LCM.  The  scheduling  window  for  each  task  instance  is  derived  to 
satisfy  the  timing  constraints.  ^Ve  develop  a  simulated  annealing  algorithm  as  the  overall  control 
algorithm.  An  example  problem  of  the  sanitized  version  of  the  Boeing  777  Aircraft  Information 
Management  System  is  solved  by  the  algorithm.  Experimental  results  show  that  the  algorithm 
solves  the  problem  in  a  reasonable  time  complexity. 


•This  work  is  supporied  in  pari  by  Honeywd]  under  N00014-$1-C-0IS5  and  Army /Phillips  under  DASG-60.92- 
C-0055.  The  views,  opinions,  and/or  findings  contained  in  this  report  are  those  of  the  author (s)  and  should  not  be 
interpreted  as  representing  the  official  policies,  either  expressed  or  implied,  of  Honey weD  or  Army /Phillips. 
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1  Introduction 


The  task  allocation  and  scheduling  problem  is  one  of  the  basic  issues  of  building  real-time  ap¬ 
plications  on  a  distributed  computing  system  (DCS).  DCS  is  typically  modeled  as  a  collection  of 
processors  interconnected  by  a  communication  network.  For  hard  real-time  applications,  the  allo¬ 
cation  of  tasks  over  DCS  is  to  fully  utilize  the  available  processors  and  the  scheduling  is  to  meet 
their  timing  constraints.  Failure  to  meet  the  specified  timing  constraints  or  inability  to  respond 
correctly  can  result  in  disastrous  consequence. 

For  the  hard  real-time  applications,  such  as  avionics  systems  and  nuclear  power  systems,  the 
approach  to  guarantee  the  critical  timing  constraints  is  to  allocate  and  schedule  tasks  a  priori. 
The  essential  solution  is  to  find  an  static  allocation  in  which  there  exists  a  feasible  schedule  for  the 
given  task  sets.  Ramamritham  [Ram90]  proposes  a  global  view  where  the  purpose  of  allocation 
should  directly  address  the  schedulability  of  processors  and  communication  network.  A  heuristic 
approach  is  taken  to  determine  an  allocation  and  find  a  feasible  schedule  under  the  allocation. 
TindeU  et  al.  [TBW92]  take  the  same  global  view  and  exploit  a  simulated  annealing  technique 
to  allocate  periodic  tasks.  A  distributed  rate-monotonic  scheduling  algorithm  is  implemented.  In 
each  period  a  task  must  execute  once  before  the  spedfied  deadline.  The  transmission  times  for 
the  communications  are  taken  into  account  by  subtracting  the  total  communication  time  from  the 
deadline  and  making  the  execution  of  the  task  more  stringent. 

Simply  assuring  that  one  instance  of  each  task  starts  after  the  ready  time  and  completes  before 
the  spedfied  deadline  is  nor  enough.  Some  real-time  appBcations  have  more  complicated  timing 
constraints  for  the  tasks.  For  example,  the  relative  timing  constraints  may  be  imposed  upon 
the  consecutive  executions  of  a  task  in  which  the  scheduling  of  two  consecutive  executions  of  a 
periodic  task  must  be  separated  by  a  minimum  execution  inter%'al.  Communication  latency  can  be 
spedfied  to  make  sure  that  the  time  difference  between  the  completion  of  the  sending  task  and  the 
start  of  the  receiving  task  does  not  exceed  the  spedfied  value.  The  Boeing  777  Aircraft  Information 
Management  System  is  such  an  example  {CDHC94].  For  such  applications,  the  algorithms  proposed 
in  literature  do  not  work  because  the  timing  constraints  are  imposed  across  the  periods  of  tasks.  In 
this  paper,  we  consider  the  relative  timing  constraints  for  real  examples  of  real-time  applications 
in  Section  2.  Based  on  the  task  characteristics,  we  propose  the  approach  to  allocate  and  schedule 
these  applications  in  Section  3.  A  simulated  annealing  algorithm  is  developed  to  solve  the  problem 
in  which  the  reduction  on  the  search  space  is  given  in  Section  4.  In  Section  5,  we  evaluate  the 
practicality  and  show  the  significance  of  the  algorithm.  Instead  of  randomly  generating  the  ad  hoc 
test  cases,  we  apply  the  algorithm  to  a  real  example.  The  example  is  the  Boeing  777  AIMS  with 
various  numbers  of  processors.  The  experimental  results  are  shown  in  Section  5. 
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2  Problem  Description 


Various  kinds  of  periodic  task  models  have  been  proposed  to  represent  the  real-time  system  char¬ 
acteristics.  One  of  them  is  to  model  an  application  as  an  independent  set  of  tasks,  in  which  each 
task  is  executed  once  every  period  under  the  ready  time  and  deadline  constradnts.  Synchronization 
precedence  and  mutual  exclusion)  and  communications  are  simply  ignored.  Another  model 
to  take  the  precedence  relationship  and  communications  into  account  is  to  model  the  application 
as  a  task  graph.  In  a  task  graph,  tasks  are  represented  as  nodes  while  communications  and  prece¬ 
dence  relationship  between  tasks  are  represented  as  edges.  The  absolute  timing  constraints  can 
be  imposed  on  the  tasks.  Tasks  have  to  be  allocated  and  scheduled  to  meet  their  ready  time  and 
deadline  constraints  upon  the  presence  of  synchronization  and  communications.  The  deficiency 
of  task  graph  modeling  is  inability  of  specifying  the  relative  constraints  across  task  periods.  For 

example,  one  can  not  specify  the  minimum  separation  interval  between  two  consecutive  executions 
of  the  same  task. 

In  the  work  [CA93],  we  modified  the  real-time  system  characteristics  b'y  taldng  into  account 
the  relative  constraints  on  the  instances  of  a  task.  We  considered  th€  $cheiulii\g  problem  of  the 
periodic  tasks  with  the  relative  timing  constraints.  We  analyzed  the  timing  constraints  and  derive 
the  scheduling  window  for  each  task  instance.  Based  on  the  scheduling  window,  we  presented 
the  time-based  approach  of  scheduling  a  task  instance.  The  task  instances  are  scheduled  one  bv 
one  based  on  their  priorities  assigned  hy  the  proposed  algorithms.  In  this  paper  we  augment  the 
real-time  system  characteristics  by  considering  the  inter-tsisk  communication  on  DCS. 

2.1  Task  Characteristics 

The  problem  considered  in  this  chapter  has  the  following  charactertstics. 

•  The  Fundamentals:  A  task  is  denoted  by  the  4-tuple  <  p,-,  e,-.  A,-,  77, •  >  denoting  the  period, 
computation  time,  low  jitter  and  high  jitter  respectively.  One  instance  of  a  task  is  executed 
each  period.  The  execution  of  a  task  instance  is  non-preemptable.  The  start  times  of  two 
consecutive  instances  of  task  t,-  are  at  least  p,-  -  A,-  and  at  most  p,-  -i-  77,-  apart.  Let  and 

//  be  the  start  time  and  finish  time  of  task  instance  rf  respectively.  The  timing  constraints 
specified  in  Equations  1  through  4  must  be  satisfied. 

//  =  4-^e.- 

=  si-fLCM 

si  >  sr^ 
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f  Pi  -  Ai 


(1) 

(2) 

(3) 


(4) 


Si  <  +  Pi  +  7)7 

Vj  =  2,...,n,  +  l. 

•  Asynchronous  Communication:  Tasks  communicate  with  each  others  by  sending  and 
receiving  data  or  messages.  The  frequencies  of  sending  and  receiving  tasks  of  a  communication 
carr  be  different.  In  consequence,  communications  between  tasks  may  cross  the  task  periods. 
When  such  asynchronous  communications  occur,  the  semantics  of  undersampling  is  assumed. 
When  two  tasks  of  different  frequencies  axe  communicating,  schedule  the  message  only  at 
the  lower  rate.  For  example,  if  task  A  (of  lOHZ)  sends  a  message  to  task  B  (of  5HZ),  then 
in  every  ZOOms,  one  of  two  instances  of  task  A  has  to  send  a  message  to  one  instance  of 
task  B.  If  the  sending  and  receiving  tasks  are  assigned  to  the  same  processor,  then  a  local 
communication  occurs.  We  assume  the  time  taken  by  a  local  communication  is  negligible. 
When  an  interprocessor  communication  (IPC)  occurs,  the  communication  must  be  scheduled 
on  the  communications  network  between  the  end  of  the  sending  task  execution  and  the  start 
of  the  receiving  task  execution.  The  transmission  time  required  to  communicate  the  message 
{  over  the  network  is  denoted  by  pi, 

•  Communication  Latency:  Each  communication  is  associated  with  a  communication  la- 
tency  which  specifies  the  maximum  separation  between  the  start  time  of  the  sending  task  and 
the  completion  time  of  the  receiving  task. 

•  Cvclic  Dependency:  Research  on  the  allocation  problem  has  usually  focused  on  acyclic 

task  graphs  [Ram90,  HS92].  Given  an  acyclic  task  graph  G  =  if  the  edge  from  task 

A  to  task  B  is  in  E  then  the  edge  from  B  to  A  can  not  be  in  £.  The  use  of  acyclic  task 
graphs  excludes  the  possibility  of  specifying  the  cyclic  dependency  among  tasks.  For  example, 
consider  the  following  situation  in  which  one  instance  of  task  A  can  not  start  its  exeeuw^on 
until  it  receives  data  from  the  last  instance  of  task  B.  After  the  instance  of  task  A  fimshed 
its  execution,  it  sends  data  to  the  next  instance  of  task  B.  Since  tasks  A  and  B  are  periodic, 
the  communication  pattern  goes  on  throughout  the  lifetime  of  the  application.  To  be  able  to 
accommodate  this  situation,  we  take  cyclic  dependency  into  consideration. 

The  timing  constraints  described  above  are  shown  in  Figure  1.  For  periodic  tasks  A  and  B,  the 
start  times  of  each  and  every  instance  of  task  execution  and  commum cation  are  pre-scheduled  such 
that  (1)  the  execution  intervals  fall  into  the  range  between  p-X  and  p  +  and  (2)  the  time  window 
between  the  stairt  time  of  sending  task  and  the  completion  time  of  receiving  task  is  less  than  the 
latency  of  the  communication.  In  Figure  2,  we  illustrate  examples  of  all  possible  communication 
patterns  considered  in  this  paper.  The  description  of  the  communications  in  the  task  system  is  in 
the  form  of  ''From  sender-iask-id  (of  frequency)  To  receiver-task-id  (of  frequency/'.  If  the  sender 
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Time 


Proc 


Pa  -  >‘A  <  Pa +  VA 


A 


A 


T  <  Latency  (B  to  A) 


PB-  <  PB  +  VB 


T  <  Latency  (A  to  B) 


Network: 


A  to  B 


B  to  A 


A  to  B 


Figure  1:  Relative  Timing  Constraints 


frequency  is  n  times  of  the  receiver  frequency  and  no  cyclic  dependency  is  involved,  then  one 
of  every  n  instances  of  the  sending  task  has  to  communicate  with  one  instance  of  the  receiving 
task.  (Examples  of  this  situation  are  shown  in  Figures  2.a.l  and  2.a.2.  Likewise,  for  the  case  in 
which  the  receiver  frequency  is  n  time  that  of  the  sender  frequency  and  no  cydfic  dependency  is 
present,  the  patterns  are  shown  in  Figures  2.b.l  and  2.b.2.  For  an  asynchronous  communication,  the 
sending  (receiving)  task  in  low  frequency  sends  (receives)  the  message  to  (from)  the  nearest  receiving 
(sending)  task  as  shown  in  Figure  2. a  (2.b).  The  cases  where  cyclic  dependency  is  considered  are 
shown  in  Figures  2.c  and  2.d. 


2.2  System  Model 

A  real-time  DCS  consists  of  a  number  of  processors  connected  together  by  a  communications 
network.  The  execution  of  an  instance  on  a  processor  is  nonpreemptable.  To  provide  predictable 
communication  and  to  avoid  contention  for  the  communication  channel  at  the  run  time,  we  make  the 
following  assumptions.  (1)  Each  IPC  occurs  at  the  pre-scheduled  time  as  the  schedule  is  generated. 
(2)  At  most  one  communication  can  occur  at  any  given  time  on  the  network. 
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200  ms 


200  ms 


(a. 2) 

From  A  (of  lOHZ)  to  B  (of  5EZ) 


(b-2) 

From  A  (of  5HZ)  to  B  (of  lOHZ) 


Figure  2;  Possible  Communica-tion  Patterns 


no 


2.3  Problem  Formulation 


We  consider  the  static  assignment  and  scheduling  in  which  a  taslc  is  the  finest  granularity  object 

of  assignment  and  an  instance  is  the  unit  of  scheduling.  W’e  applied  the  simulated  annealing 

algorithm  [KGV83]  to  solve  the  problem  of  real-time  periodic  task  assignment' and  scheduling  with 
hybrid,  timing  constraints.  In  order  to  make  the  execution  of  instances  satisfy  the  specifications 
and  meet  the  timing  constraints,  we  consider  a  scheduling  frame  whose  length  is  the  least  common 
multiple  (LCM)  of  all  periods  of  tasks.  Given  a  task  set  P  and  its  communications  C,  we  construct 
a  set  of  task  instances,  I,  and  a  set  of  multiple  communications,  M.  We  extend  each  task  r;  6  F 
to  n,-  instances,  r,?,  . . and  These  n,-  instances  are  added  to  J.  Each  communication  t,-  i-t 
Tj  6  C  is  extended  to  min(n,',nj)^  undersampled  communications  where  n,-  =  LCM/p,-  and  Uj  = 
LCM/pj.  These  multiple  communications  are  added  to  M.  The  extension  can  be  stated  as  follows. 

•  If  n{  <  nj,  then  Ti  Tj  is  extended  to  >-*  rj,  r?  rj,  . . . ,  and  t”’  rj. 

•  If  7x,-  >  Tij,  then  Ti  *-*•  Tj  is  extended  to  rj  t-r  rj,  r/  i-r  rj,  . . and  t/  r”-’. 

•  If  n,-  =  rij,  then  r,-  tj  is  extended  to  rj,  r?  ^  rj,  .. ..,  and  t"'’. 

A  task  ID  with  a  superscript  of  question  mark  indicates  some  instance  of  the  task.  For  example, 
r}  ^  rJ  means  that  r?  communicates  with  some  instance  of  Tj.  We  describe  how  we  assign  the 
nearest  instance  for  each  communication  in  Section  4.1.2. 

The  problem  can  be  formulated  as  follows.  Given  a  set  of  task  instance,  I.  its  communications 
M,  we  find  an  assignment  6,  a  total  ordering  Cm  of  all  instances,  and  a  total  ordering  Pc  of  all 
communications  to  minimize 


Ot)  =  ^  6{pi  -  Xi  -  +4)  -r  ^  -  -s-  -  Pi  -  Vi) 

E  «(//  -  ■ii)  r  ~  ‘i’ 

ij 

+  H  Hfk  -  4  ~  Latency  (t,-  to  Tk))  (5) 

subject  to  sj  >  4  and  5(4  ^  //t  4  4 > 


where 

^Due  to  underszLinpling,  •when  am  asynchronous  communi cation  is  extended  to  muitiple  communications,  the 
number  of  mu3tip3e  communications  is  the  smaller  number  of  sender  and  receiver  instances. 
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•  is  the  start  time  of  rf  under  a„,. 

•  //  is  the  completion  time  of  rj  under  Om- 

•  rj  =  Pi  X  (j  -  1)  -f  T,-,  and  =  pi  x  {j  -  1)  +  d,-. 

•  d(i)  =  0,  if  I  <  0;  and  =  i,  if  x  >  0. 

•  <j>{ri)  is  the  ID  of  processor  which  r,-  is  assigned  to. 

•  r/  H-f  is  the  communication  from  rf  to  rj[.  K  =  d>(Tfc),  then  r-  Tj[  is  a  local 
communication. 

•  5(c,  Oe)  is  the  start  time  of  communication  c  on  the  network  under  Pc- 

•  J’(c,  Ce)  is  the  completion  time  of  communication  c  on  the  network  under  a^. 

The  minimum  value  of  E{<i>.Crm<^c)  is  xero.  It  occurs  when  the  executions  of  all  instances 
meet  the  jitter  constraints  and  all  communications  meet  their  laiencj'  constraints.  A  feasible 
multiprocessor  schedule  can  be  obtained  by  collecting  the  values  of  and  i  and  j.  Likewise, 
a  feasible  network  schedule  can  be  obtained  from  S{c,Oc)s  and  F{c,Ct.)s. 

Since  the  Task  system  is  as^mchronous  and  the  communication  pattern  could  be  in  the  form  of 
cyclic  dependency,  we  solve  the  problem  of  finding  a  feasible  solution  (©,  ^e)  exploiting  the 

cyclic  scheduling  technique  and  embedding  the  technique  into  the  simulated  annealing  algorithm. 

3  The  Approach 

3.1  Bounds  of  a  Scheduling  Window 

Define  the  scheduling  window  for  a  task  instance  as  the  time  interval  during  which  the  task  can 
start.  Traditionally,  the  lower  and  upper  bounds  of  the  scheduling  wundow  for  a  task  instance  are 
called  earliest  start  time  (est)  and  latest  start  time  (2sf)  respectively.  These  values  are  given  and 
independent  of  the  start  times  of  the  preceding  instances. 

We  consider  the  scheduling  of  periodic  tasks  wdth  relative  timing  constraints  described  in  Equa¬ 
tions  3  and  4.  The  scheduling  window  for  a  task  instance  is  derived  from  the  start  times  of  its 
preceding  instances.  A  feasible  scheduling  window  for  a  task  instance  r-  is  a  scheduling  window 
in  which  any  start  time  in  the  window  makes  the  timing  relation  between  and  sf  satisfy 
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Before: 


After: 


Figure  3:  Insertion  of  a  new  time  slot 


3.2.1  Recurrence 

Given  any  solution  point  (o,  Om,  Cc),  we  construct  the  schedule  by  inserting  time  slots  to  the  linked 
lists.  Let  Cm-  io^k.id  x  insia.nct.id  —  integer.  The  insertion  of  a  lime  slot  for precedes  that  for 

'k 

p  that  Equations  6  and  7  specify  the  bounds  of  the  scheduling  window  for  a  task  instance. 
Due  to  the  communications,  esiirj')  in  Equation  6  may  not  be  the  earliest  time  for  .  ^^'e  define 
the  effective  start  time  as  the  time  when  (1)  the  hybrid  constraints  are  satisfied  and  (2)  rf  receives 
all  necessary  data  or  messages  from  all  the  senders. 

Given  the  effective  start  time  r  and  the  assignment  of  r,-  (i.e.  p  =  ^(r,)),  a  time  slot  of  processor 
p  is  assigned  to  rf  where  siart.time  >  r  and  finish.time  -  siart.time  =  ej.  ffote  that  we  have 
to  make  sure  the  new  time  slot  does  not  overlap  existent  time  slots.  Since  (Ij  the  executions  of 
all  instances  within  one  scheduling  frame  recur  in  the  next  scheduling  frame  and  (2)  it  is  possible 
that  the  time  slot  for  some  instance  is  over  LCM,  we  subtract  one  LCM  from  the  starijtime  or 
Jinish.time  if  it  is  greater  than  LCM.  It  means  the  time  slot  for  this  task  instance  will  be  modulated 
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Equations  3  and  4.  Formally,  given  s},  sj,  . . and  . . s^~^,  the  problem  is  to  derive  the  feasible 

schedxiling  window  for  r-  such  that  a  feasible  schedule  can  be  obtained  if  rf  is  scheduled  within 
the  window. 

Proposition  1  fCA93j:  Let  the  est  and  1st  of  rj  be 

est(r/)  =  +  p.-  -  A,-),  {s]  +  {j  -  1)  x  pi  -  (n,-  -  j'  +  1)  x  tj,)},  (6) 

and  lst{Tf)  =  min{(sp^  +  p.-  +  t?,),  (sJ  +  (j  -  1)  x  p,-  +  (n,-  -  j  +  1)  x  A,)}.  (7) 

If  5^  is  in  between  the  esi{T-)  and  lst{T-),  then  the  estimated  est  and  Isi  of  s"'  ,  based  on  S;  and 
specify  a  feasible  window. 


3.2  Cyclic  Scheduling  Technique 

The  basic  approach  of  scheduling  a  set  of  synchronous  periodic  tasks  is  to  consider  the  execution 
of  all  instances  within  the  scheduling  frame  whose  length  is  the  LCM  of  all  periods.  The  release 
times  of  the  first  periods  of  all  tasks  are  zero.  As  long  as  one  instance  is  scheduled  in  each  period 
within  the  frame  and  these  executions  meet  the  timing  constraints,  a  feasible  schedule  is  obtained. 
In  a  feasible  schedule,  all  instances  complete  the  executions  before  the  LCM. 

On  the  other  hand,  in  asynchronous  task  systems,  as  depicted  in  Figure  2  in  which  the  LCM 
is  200tos,  the  periods  of  the  two  tasks  axe  out  of  phase.  It  is  possible  that  the  completion  time 
of  some  instance  in  a  feasible  schedule  exceeds  the  LCM.  To  find  a  feasible  schedule  for  such  an 
asynchronous  system,  a  technique  of  handling  the  time  value  which  exceeds  the  LCM  is  proposed. 

The  technique  is  based  on  the  linked  list  structure  described  in  the  work  [CA93].  Without  loss 
of  generality,  we  assume  the  nunimum  release  time  among  the  first  periods  of  all  tasks  is  zero.  We 
keep  a  linked  list  for  each  processor  and  a  separated  list  for  the  communication  network.  Each 
element  in  the  list  represents  a  time  slot  assigned  to  some  instance  or  communication.  The  fields  of 
a  time  slot  of  some  processor  p:  (1)  task  id  i  and  instance  id  j  indicate  the  identifier  of  the  time  slot. 

(2)  start  time  st  and  finish  time  ft  indicate  the  start  time  and  completion  time  of  rf  respectively. 

(3)  prev  ptr  and  next  pir  are  the  pointers  to  the  preceding  and  succeeding  time  slots  respectively. 
The  list  is  arranged  in  an  increasing  order  of  starijlime.  Any  two  time  slots  are  nonoverlapping. 
Since  the  execution  of  an  instance  is  nonpreemptable,  the  time  difference  between  startjtime  and 
finishjtime  equals  the  execution  time  of  the  task. 
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Figure  5:  Asynchronous  communications  in  mutuality 


and  k  can  be  stated  ais 

{j-l)xn<i<k<jxn.  (S) 

A  graphical  illustration  can  be  found  in  Figure  5.  in  the  example,  the  values  of  t,  j,  k,  and  n  are 
6,  2,  8,  4  respectively.  The  communications  r®  and  •->  r|  are  scheduled  before  and  after 

the  scheduling  of  respectively. 

4  The  Simulated  Annealing  Algorithm 

Kirkpatrick  ei  al.  [KGV83]  proposed  a  simulated  annealing  algorithm  for  combmatonal  optiimza- 
tion  problems.  Simulated  annealing  is  a  global  optimization  technique.  It  is  derived  from  tne 
obser\-ation  that  an  optimization  problem  can  be  identified  with  a  fluid.  There  exists  an  analogy 
between  finding  an  optimal  solution  of  a  combinatorial  problem  with  many  \7a-3ables  and  the  slow 
cooling  of  a  molten  metal  until  it  reaches  its  low  energy  ground  state.  Hence,  the  terms  about 
energ}^  function,  temperature,  and  thermal  equilibrium  axe  mostly  used.  During  the  search  of  an 
optimal  solution,  the  algorithm  always  accepts  the  downward  moves  from  the  current  solution  pomt 
to  the  points  of  lower  energy  values,  while  there  is  stiU  a  small  chance  of  accepting  upward  moves 
to  the  points  of  higher  energy  values.  The  probability  of  accepting  an  uphiU  move  is  a  function  of 
current  temperature.  The  purpose  of  hill  dimbing  is  to  escape  from  a  local  optimal  conflguration. 
If  there  are  no  upward  or  downward  moves  over  a  number  of  iterations,  the  thermal  equilibrium 
is  reached.  The  temperature  then  is  reduced  to  a  smaller  value  and  the  searching  continues  from 
the  current  solution  point.  The  whole  process  terminates  when  either  (1)  the  lowest  energj-  point 
is  found  or  (2)  no  upward  or  downward  jumps  have  been  taken  for  a  number  of  successive  thermal 

equilibrium. 

The  structure  of  simulated  annealing  (SA)  algorithm  is  shown  in  Figure  7.  The  first  step  of 
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and  wrapped  to  the  beginning  of  the  schedule.  As  shown  in  Figure  3  The  siaTtJtime  of  the  new 
slot  is  r  while  the  completion  time  is  r  +  e— LCM. 

3.3  Pseudo  Instances 

As  stated  in  Section  2,  we  consider  the  communication  pattern  in  which  cyclic  dependency  exists 
among  tasks.  Given  a  set  of  tasks,  F,  a  set  of  task  instances,  I,  a  set  of  communications,  C,  and 
any  solution  point,  (o,  Om,  Cc)}  "^e  introduce  pseudo  instances  to  solve  this  problem.  For  any  task 
r-,  if  there  exists  a  task  Ty,  in  which  (1)  <  a„(r^),  V  i,  (2)  n-  =  Uy,  and  (3)  r.  >-  7y  e 

C  and  Ty  ^  r-  €  C,  then  a  pseudo  instance  is  added  to  J.  A  pseudo  instance  is  always  a 

receiving  instance.  No  insertion  of  time  slots  for  pseudo  instances  is  needed.  For  a  pseudo  instance, 
only  the  effective  start  time  is  concerned.  The  effective  start  time  of  a  pseudo  instance  in 

the  constructed  schedule  based  on  (®,£Tto,  Oc)  is  checked  to  see  whether  it  is  less  than  LCM  -i-  si  or 
not.  If  yes,  then  the  execution  of  ri  for  the  next  scheduling  frame  may  start  at  LCM-  +  si  which 
is  exactly  one  LCM  away  from  the  execution  of  for  the  current  scheduling  frame.  A  graphical 
iUustration  of  the  introduction  of  pseudo  instance  to  solve  the  synchronous  communications  of 
cyclic  dependency  is  given  in  Figure  4  in  which  n~  =  2. 

.4s  for  the  asynchronous  comm  uni  cations  of  cyclic  dependency,  no  pseudo  instances  are  needed. 
For  exaimple,  if  both  r-  •-*  Ty  and  Ty  t.  exist  and  n-  =  n.y  x  n,  then  for  each  rf,  where  j  =  1, 
2,  . . .,  riy.  find  a  sending  instance  tL  £  I  and  a  receiving  instance  rt  £  I  such  that  (1)  fl  < 

(2)  fi  <  s^,  and  (3)  and  rf  r*  are  the  communications.  The  relationship  between  i,  j. 
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4.1.1  Priority  Assignment  of  Task  Instances:  Om 

In  the  work  [CA93],  we  presented  the  SLsF  algorithm  and  the  performance  evaluation.  The  re¬ 
sults  showed  that  SLsF  outperforms  SPF  and  SJF.  In  this  paper  we  use  the  SLsF  as  the  priority 
assignment  algorithm  for  the  task  instances  in  I . 

Formally,  if  Istirf)  <  lsi{rl),  then  amir-)  <  (^m{Tk)-  insertion  of  a  time  slot  for 

t/  precedes  that  for  rl  if  a^(7f  )  <  a,n(ri).  The  time-based  scheduling  algorithm  for  a  task 
instance  is  used  to  find  a  time  slot  for  a  task  instance  once  the  effective  start  time  is  given.  We 
define  the  effective  start  time  of  a  task  instance  as  the  earliest  start  time  when  the  incoming 
communications  are  taken  into  account.  Let  t  be  the  maximum  completion  time  among  all  the 
incoming  communications  of  a  task  instance,  then  the  effective  start  time  of  the  task  instance  is  set 
to  the  bigger  value  among  t  and  est  (as  stated  in  Equation  6). 

4.1.2  Scheduling  the  Incoming  Communications:ae 

There  are  two  kinds  of  incoming  communications.  The  first  kind  is  called  the  synchronous  com¬ 
munication  in  which  the  frequencies  of  the  sender  and  recriver  are  identical.  The  other  kind  is 
called  the  asjmchronous  communication  in  which  the  sending  task  instance  is  associated  with  a 
question  mark.  For  such  an  asynchronous  communication,  we  have  to  dedde  which  instance  of  the 
sending  task  should  communicate  with  the  receiving  task  instance.  The  approach  we  take  is  to  find 
the  nearest  instance  of  the  sending  task.  The  reason  is  that,  by  finding  the  nearest  instance,  the 
time  difference  between  start  time  of  the  receiving  instance  and  the  completion  time  o.  the  sending 
instance  is  the  smallest.  The  chance  of  violating  the  latency  constraint  of  a  communication  will  be 
the  smallest  then. 

The  nearest  instance  of  a  sending  task  can  be  found  using  the  following  method.  Given  an 
incoming  communication  --  if,  and  the  effective  start  time  of  rf,  eft  we  search  through  the 
linked  list  of  processor  ©(n)  up  to  time  e/1.  If  there  is  some  instance  of  n,  say  r]^,  whose  completion 
time  is  the  latest  among  all  scheduled  instances  of  t*,  then  the  nearest  instance  is  found.  Otherwise, 
we  continue  to  search  through  the  linked  list  until  an  instance  of  t*  is  found.  We  set  the  effective 
start  time  of  the  communication  to  be  the  completion  time  of  the  found  instance.  We  also  erase 
the  question  mark  such  that  ^  rf  is  changed  to  rjf  .  For  the  synchronous  communication, 
the  effective  start  time  of  the  communication  is  simply  assigned  as  the  finish  time  of  the  sending 
task  instance. 

The  scheduling  of  the  communication  is  done  by  inserting  a  time  slot  to  the  linked  list  for  the 
communications  network.  The  start  time  of  the  time  slot  can  not  be  earlier  than  the  effective  start 
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the  algorithm  is  to  randomly  choose  an  assignment  a  total  ordering  of  instances  within  one 
scheduling  frame,  and  a  total  ordering  of  communications  for  the  instances,  Cc-  A  solution 
point  in  the  search  space  of  SA  is  a  3-tuple  The  energy  of  a  solution  point  is  computed  by 

equation  (5).  For  each  solution  point  P  which  is  infeatsible,  (i.e.  Ep  is  nonzero),  a  neighbor  finding 
strategy  is  invoked  to  generate  a  neighbor  of  jP.  As  stated  before,  if  the  energy  of  the  neighbor  is 
lower  than  the  current  value,  we  accept  the  neighbor  as  the  current  solution;  otherwise,  a  probability 

function  (i.e.  exp(--^Y  ^))  is  evaluated  to  determine  whether  to  accept  the  neighbor  or  not.  The 
parameter  of  the  probability  function  is  the  current  temperature.  As  the  temperature  is  decreasing, 
the  chance  of  accepting  an  uphiD  jump  (i.e.  a  solution  point  with  a  higher  energj^  level)  is  smaller. 
The  inner  and  outer  loops  axe  for  thermal  equilibrium  and  termination  respectively.  The  number  of 
iterations  for  the  inner  loop  is  also  a  function  of  current  temperature.  The  lower  the  temperature 
is,  the  bigger  the  number  is.  Methods  about  how  to  model  the  numbers  of  iterations  and  how 
to  assign  the  number  for  each  temperature  have  been  proposed  [LH91].  In  this  dissertation,  we 
consider  a  simple  incremental  function.  Namely,  +  A  where  N  is  the  number  of  iterations 

and  A  is  a  constant.  The  termination  condition  for  the  outer  loop  is  Ep  =  0.  Whenever  thermal 
equilibrium  is  reached  at  a  temperature,  the  temperature  is  decreased.  Linear  or  nonlinear  approach 
of  temperature  decrease  function  can  be  simple  or  complex.  Here  we  consider  a  simple  multiplication 
function  (i.e.  T  =  T  x  a,  where  a  <  1). 


4.1  Evaluation  of  Energy  Value  for  a  Solution  Point  (6,  Cc) 

The  computation  of  the  energy  value  stated  in  Equation  5  ,  is  done  by  constructing  multi-processor 
schedules  and  a  network  schedule,  and  collecting  the  the  start  and  completion  times  of  each  task 
instance  and  communication  from  these  schedules. 

The  construction  of  the  schedules  is  characterized  by  the  priority  assignment  of  the  task  in¬ 
stances  in  the  set.  The  priority  assignment  algorithm  determines  the  scheduling  order  among  aD 
the  task  instances.  Each  time  when  a  task  instance  is  chosen  to  be  scheduled,  the  incoming  com¬ 
munications  of  the  instance  aire  scheduled  first  and  then  the  task  instance  itself.  After  all  the 
task  instances  have  been  scheduled,  the  scheduling  of  the  outgoing  communications  is  performed. 
An  algorithmic  description  about  how  to  compute  the  energ}^  value  for  a  solution  point  is  given 
in  Figure  6.  Note  that  a  communication  is  an  incoming  communication  to  a  task  instance  if  the 
Irequenc}’  of  the  receiving  task  instance  is  equal  to  or  less  than  that  of  the  sending  task  instance. 

For  example,  ^  ^  ajid  axe  incoming  communications  to  rf ,  On  the  other  hand,  if 

the  sender  frequency  is  less  than  the  receiver  frequency,  then  the  communication  is  an  outgoing 
communication,  (e.g.  ^  is  the  outgoing  communication  of  t^). 
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time  of  the  communication.  Once  the  time  slot  is  inserted,  we  check  the  effective  start  time  of  rf 
to  make  sure  that  it  is  not  less  than  the  finish  time  of  the  time  slot.  K  it  is,  the  effective  start  time 
of  rf  is  updated  to  be  the  finish  time  of  the  time  slot. 

If  a  task  instance  has  more  than  one  incoming  communication,  the  scheduling  order  among  these 
communications  is  based  on  their  latency  constraints.  The  bigger  the  latency  value  is,  the  earlier 
the  communication  is  scheduled.  The  incoming  communication  with  the  tightest  latency  constraint 
is  scheduled  last.  It  is  because  the  effective  start  time  of  the  receiving  task  instance  is  constantly 
updated  by  the  scheduling  of  the  incoming  communications.  It  is  possible  that  the  scheduling  of 
the  later  incoming  communications  increases  the  effective  start  time  of  the  receiving  task  instance 
and  make  the  early  scheduled  communication  violate  its  latency  constraint  if  the  constraint  is  tight. 


4.1.3  Scheduling  the  Outgoing  Communications: 

The  scheduling  of  the  outgoing  communications  for  the  whole  task  set  is  performed  after  all  the 
task  instances  have  been  scheduled.  The  scheduling  order  among  these  communications  is  based 
on  the  finish  times  of  the  sending  task  instances.  The  task  instance  with  the  smallest  finish  time  is 
considered  first.  When  a  task  instance  is  taken  into  account,  all  its  outgoing  communications  are 
scheduled  one  by  one  according  to  their  latency  constraints.  The  communication  with  the  tightest 
latency  constraint  is  scheduled  first. 

Given  an  outgoing  communication  rf  and  the  finish  time  of  rf ,  f-,  the  effective  start 

time  of  the  communication  is  set  to  be  // .  Based  on  the  effective  start  time,  a  time  slot  in  inserted 
for  this  communication.  Then  the  nearest  instance  of  receiving  task  can  be  found  based  on  the 
finish  time  of  the  time  slot. 

For  the  example  shown  in  Figure  5,  The  incoming  communication  marked  with  “(1)”  is  scheduled 
before  the  scheduling  of  .  The  sixth  instance  of  r-  is  chosen  as  the  nearest  instance.  As  for  the 
outgoing  communication  marked  -with  “(3)",  it  is  scheduled  after  the  scheduling  of  r|,  r|,  rl,  and 
rf .  In  this  example,  rj  is  the  nearest  instance  of  the  outgoing  communication. 


4.2  Neighbor  Finding  Strategy:  <p 

The  neighbor  finding  strateg}’  is  used  to  find  the  next  solution  point  once  the  current  solution  point 
is  e^■aluated  as  infeasible  (i.e.  energj'  \'alue  is  nonnegative).  The  neighbor  space  of  a  solution  point 
is  the  set  of  points  which  can  be  reached  by  changing  the  assignment  of  one  or  two  tasks.  There 
are  several  modes  of  neighbor  finding  strategy. 
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•  Balance  Mode:  We  randomly  move  a  task  from  the  heavily-loaded  processor  to  the  lightest- 
loaded  processor.  This  move  tries  to  balance  the  workload  of  processors.  By  balancing  the 
workload,  the  chance  to  find  a  neighbor  with  a  lower  energy  value  is  bigger. 

•  Swap  Mode:  We  random]3’  choose  two  tasks  t,’  and  Tj  on  processors  p  and  (j  respectively. 
Then  we  change  d>  b}'  setting  =  q  and  9(tj)  =  p. 

•  Merge  Mode:  We  pick  two  tasks  and  move  them  to  one  processor.  By  merging  two  tasks  to 
a  processor,  we  increase  the  workload  of  the  processor.  There  is  an  opportunity  of  increasing 
the  energy  level  of  the  new  point  by  increasing  the  workload  of  the  processor.  The  purpose  of 
the  move  is  to  perturb  the  system  and  allow  the  next  move  to  escape  from  the  local  optimum. 

•  Direct  Mode:  When  the  system  is  in  a  low-energy  state,  only  few  tasks  violate  the  jitter 
or  latency  constraints.  Under  such  a  circumstance,  it  wiU  be  more  beneficial  to  change  the 
assignment  of  these  tasks  instead  of  randomly  moving  other  tasks.  From  the  conducted  ex¬ 
periments,  we  find  that  this  mode  can  accelerate  the  searching  of  a  feasible  solution  especially 
when  the  system  is  about  to  reach  the  equilibrium. 

The  selection  of  the  appropriate  mode  to  find  a  neighbor  is  based  on  the  current  sj’stem  state. 
Given  a  randomlj'  generated  initial  state  (i.e.  solution  point),  the  workload  discrepancj’  between 
the  processors  maj’  be  huge.  Hence,  in  the  early  stage  of  the  simulated  annealing,  the  balance 
mode  is  useful  to  balance  the  workload.  After  the  processor  workload  is  balanced  out,  the  swap 
mode  and  the  merge  mode  are  frequentl}'  used  lo  find  a  lower  energy  state  until  the  sj’stem  reaches 
near-termination  state.  In  the  final  stage  of  the  annealing,  the  direct  mode  tries  to  find  a  feasible 
solution.  The  whole  process  terminates  when  a  feasible  solution  is  found  in  w'hich  the  energ}'  value 
is  zero. 

5  Experimental  Results 

We  implemented  the  algorithm  as  the  framework  of  the  allocator  on  MARUTJ\G'M.K'^91,  MSA92, 
SdSA94],  a  real-time  operating  st’stem  developed  at  the  University  of  Maryland,  and  conducted 
extensive  experiments  under  various  task  characteristics.  The  tests  involve  the  allocation  of  real¬ 
time  tasks  on  a  homogeneous  distributed  sy’stem  connected  by  a  communication  channel. 

To  test  the  practicality  of  the  approach  and  show’  the  signincance  of  the  algorithm,  we  consider  a 
simplified  and  sanitized  version  of  a  real  problem.  This  was  derived  from  actual  development  work, 
and  is  therefore  representative  of  the  scheduling  requirements  of  an  actual  avionics  system.  The 
Boeing  777  Aircraft  Information  Management  System  (AIMS)  is  to  be  running  on  a  multiprocessor 
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10-Proc 

9-Proc 

8JProc 

7JProc 

6JProc 

1  Exec.Time  (Sec) 

2369 

5572 

19774 

36218 

78647 

=  Hr  :  Min  :  Sec 

5:29:34 

Table  1:  The  execution  times  of  the  AIMS  with  different  number  of  processors 

system  connected  by  a  SafeBus  (TM)  ultra-reliable  bus.  The  problem  is  to  find  the  minimum 
number  of  processors  needed  to  assign  the  tasks  to  these  processors.  The  objective  is  to  develop 
an  off-line  non-preemptable  schedule  for  each  processor  and  one  schedule  for  the  SafeBus  (TM) 
ultra-reliable  bus. 

The  AIMS  consists  of  155  tasks  and  951  communications  between  these  tasks.  The  frequencies 
of  the  tasks  vary  from  5HZ  to  40HZ.  The  execution  times  of  the  tasks  vary  from  0ms  to  16.650ms. 
The  NEI  and  XEI  of  a  task  1,-  axe  pi  —  bOOps  and  pi  +  500^s  respectively.  Since  6  =  1000;is  =  1ms 
<  ^  tjje  smallest-period-first  scheduling  algorithm  can  be  used  in  this  case.  Tasks  communicate 

with  others  asynchronously  and  in  mutuality.  The  transmission  times  for  communications  are  in  the 
range  from  Ops  to  447.733/xs.  The  latency  constraints  of  the  communications  vary  from  68.993ms 
to  200ms.  The  LCM  of  these  155  tasks  is  200ms.  When  the  whole  system  is  extended,  the  total 
number  of  task  instances  within  one  scheduling  frame  is  624  and  the  number  of  communications  is 
1580. 

For  such  a  real  and  tremendous  problem  size,  pre-analysis  is  necessary.  We  calculate  the  resource 
utilization  index  to  estimate  the  minimum  number  of  processors  needed  to  run  AIMS.  The  index 
is  defined  as 

£a(e.-  X  g,) 

LCM 

where  e,-  is  the  execution  of  task  t,-  and  g,-  =  The  obtained  index  for  -AIMS  is  5.14.  It  means 

there  exist  no  feasible  solutions  for  the  AIMS  if  the  number  of  processors  in  the  multiprocessor 
system  is  less  than  6. 

The  number  of  processors  which  the  AIMS  is  allowed  to  run  on  is  a  parameter  to  the  scheduling 
problem.  We  start  the  AIMS  scheduling  problem  with  10  processors.  After  a  feasible  solution  is 
found,  we  decrease  the  number  of  processors  by  one  and  solve  the  whole  problem  again.  We  run 
the  algorithm  on  a  DECstation  5000.  The  execution  time  for  the  AIMS  scheduling  problem  with 
different  ntimbers  of  processors  is  summarized  in  Table  1.  The  algorithm  is  able  to  find  a  feasible 
solution  of  the  AIMS  wdth  six  processors  which  is  the  minimum  number  of  processors  according 
to  the  resource  utilization  index.  The  time  to  find  such  a  feasible  solution  is  less  than  one  day 
(approximately  22  hours). 
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5.1  Discussions 


For  feasible  solutions  of  the  AIMS  with  various  numbers  of  processors,  we  calculate  the  processor 

utilization  ratio  (PUR)  of  each  processor.  The  processor  utilization  ratio  for  a  processor  p  is  defined 
as 

^  9«) 

LCM 

The  results  are  shown  in  Figure  8.  The  ratios  are  sorted  into  a  non-decreasing  order  given  a  fixed 
number  of  processors.  The  algorithm  generates  the  feasible  solutions  for  the  AIMS  with  6,  7,  8,  9 
and  10  processors  respectively.  For  example,  for  the  6-processor  case,  the  PURs  for  the  heaviest- 
loaded  and  lightest-loaded  processors  are  0.91  and  0.76  respectively.  For  the  10-processor  cases,  the 
PURs  are  0.63  and  0.28  respectively.  We  find  that  the  ratio  difference  between  the  heaviest-loaded 
processor  and  the  lightest-loaded  processor  in  the  6-processor  case  is  smaller  than  those  in  other 
cases.  It  means  the  chance  for  a  more  load-balanced  allocation  to  find  a  feasible  solution  is  bigger 
when  the  number  of  processors  is  smaller. 

The  detailed  schedules  for  the  6-processor  case  axe  shown  in  Figure  9.  The  results  are  shown 
on  an  interactive  graphical  interface  which  is  developed  for  the  design  of  MARUTl.  The  time  scale 
shown  in  Figure  S  is  lOO/xs.  So  the  LCM  is  shown  as  2000  in  the  figure,  (i.e.  2000  x  100/is  = 
200ms.)  This  solution  consists  of  seven  off-line  non-preemptive  schedules:  one  for  each  processor 
and  one  for  the  SafeBus  (TM).  Each  of  these  schedules  will  be  one  LCM  long  where  an  infinite 
schedule  can  be  produced  by  repeating  these  schedules  indefinitely.  Note  that  the  pseudo  instances 
are  introduced  to  make  sure  the  wrapping  around  at  the  end  of  the  LCM-long  schedules  should 
satisfy  the  latency  and  next-execution-intex'val  requirements  across  the  point  of  wTap-around.  The 
pseudo  instances  axe  not  shown  in  Figure  9. 

The  inclusion  of  resource  and  memory  constraints  into  the  problem  can  be  done  by  modifying 
neighbor-finding  strategy-.  Once  a  naghbor  of  the  current  point  is  generated,  it  is  checked  to 
ascertain  that  the  constraints  on  memory  etc.  are  met.  If  not,  the  neighbor  is  discarded  and 
another  neighbor  is  evaluated. 
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Given  a  solution  point  P  = 

While  there  is  some  unscheduled  task  instance  do 

Find  the  next  unscheduled  instance.  /*  By  the  SLsF  algorithm  */ 
Let  the  instance  be  rf. 

Sort  all  the  incoming  communications  of  rf  based  on 
the  latency  ^’alues  into  a  descending  order. 

Schedule  each  incoming  communication  starting  iiom 

the  biggest-latency  one  to  the  tightest-latency  one. 

Schedule  the  instance  rf. 

End  While. 

Mark  each  instance  as  un-examined. 

While  there  is  some  un-examined  task  instance  do 

Find  the  next  un-examined  task  instance.  /*  By  the  finish  times  * 
Sort  all  the  outgoing  communications  of  the  task  instance  based 
on  the  latency  values  into  an  increasing  order. 

Schedule  each  outgoing  communication  starting  from 

the  tightest-latency  one  to  the  biggest-latency  one. 

Mark  the  task  instance  examined. 


End  While. 


CoDect  the  start  time  and  finish  time  informations  for  each  task  instance  and  communication. 
Compute  the  energ}'  %*alue  using  Equation  5. 


Figure  6;  The  pseudo  code  for  computing  the  energy  value 
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Choose  an  initial  temperature  T 

Choose  randomly  a  starting  point  P  =  {<p,  a^,ac) 

Ej,  :=  Energy  of  solution  point  P 
if  P'p  =  0  then 

output  Ep  and  exit  j*  Ep  =  0  means  a  feasible  solution  */ 

end  if 
repeat 

repeat 

Choose  JV,  a  neighbor  of  P 
En  :=  Energy  of  solution  point  N 
if  E„  =  0  then 

output  En  and  exit  /*  En  =  0  means  a  feasible  solutioii  */ 

end  if 

if  En  <  Ep  then 
P  :=  N 

Ep  :=  En 

else 

—  £v—Sn 

X  j 

if  e=  >  random (0.1)  then 
P  :=  N 

Ep  :=  En 

end  if 

end  if 

until  thermal  equilibrium  at  T 
T  :=  a  X  T  (where  q  <  1) 
until  stopping  criterion 


Figure  7:  The  structure  of  simulated  annealing  algorithm. 


UtilitAtion  JUtio 


Piocmsot  UMXAtion 


Figure  9:  The  Alloca-tion  Results  and  Schedules  for  AIMS  with  6  processors 
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Scheduling  of  Periodic  Tasks  with  Relative  Timing  Constraints 


Sbeng-Tzong  Cheng  ^uld  Asbok  K.  Agrawala 
Institute  for  Advanced  Computer  Studies 
Systems  Design  and  Analysis  Group 
Department  of  Computer  Science 
University  of  Maryland 
College  Park,  MD  20742 
{stcbeng,agrawala}  @cs.umd.edu 


Abstract 

The  problem  of  non-preemptive  scheduling  of  a  set  of  periodic  tasks  on  a  single  processor 
has  been  traditionally  considering  the  ready  time  and  deadline  on  each  task.  As  a  consequence, 
a  feasible  schedule  finds  that  in  each  period  one  instance  of  each  task  starts  the  execution  after 
the  ready  time  and  completes  the  execution  before  the  deadline  . 

Recently,  the  timing  requirements  of  the  real-time  systems  emerge  that  the  relative  timing 
constraints  are  imposed  on  the  consecutive  executions  of  each  task.  In  this  paper,  we  consider 
the  scheduling  problem  of  the  periodic  tasks  with  the  relative  timing  constraints  imposed  on  two 
consecutive  executions  of  a  task.  \\'e  analyze  the  timing  constraints  and  derive  the  scheduling 
winaow  for  each  task  instance.  Based  on  the  scheduling  window,  we  present  the  time-based 
approach  of  scheduling  a  task  instance.  The  task  instances  are  scheduled  one  by  one  based  on 
their  priorities  assigned  by  the  proposed  algorithms  in  this  paper.  We  conduct  the  experiments 
to  compare  the  schedulability  of  the  algorithms. 


Honeywell  under  N00014-91-C-0195  and  Army /Phillips  under  DASG-60-92- 
C-0055.  The  views,  opinions,  and/or  findings  contained  in  this  report  are  those  of  the  auihor(s)  and  should  not  be 
interpreted  as  representing  the  official  policies,  cither  expressed  or  implied,  of  Honeywell  or  Army /Phillips. 
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1  Introduction 


The  task  scheduling  problem  is  one  of  the  basic  issues  of  building  real-time  applications  in  which  the 
tasks  of  applications  are  associated  with  timing  constraints.  For  the  hard  real-time  applications, 
such  as  avionics  systems  and  nuclear  power  systems,  the  approach  tp  guarantee  the  critical  timing 
constraints  is  to  schedule  periodic  tasks  a  priori.  A  non-preemptive  schedule  for  a  set  of  periodic 
tasks  is  generated  by  assigning  a  start  time  to  each  execution  of  a  task  to  meet  their  timing 
constraints.  Failure  to  meet  the  specified  timing  constraints  can  result  in  disastrous  consequence. 

Various  kinds  of  periodic  task  models  have  been  proposed  to  represent  the  real-time  system 
characteristics.  One  of  them  is  to  model  an  application  as  a  set  of  tasks,  in  which  each  task  is 
executed  once  every  period  under  the  ready  time  and  deadline  constraints.  These  constraints  impose 
constant  intervals  in  which  a  task  can  be  executed.  In  literature,  many  techniques  [2,  3, 4,  5,  6,  7,  8] 
have  been  proposed  to  solve  the  scheduling  problem  in  this  context.  The  deficiency  of  this  modeling 
is  the  inability  of  specifying  the  relative  constraints  across  task  periods.  For  example,  one  can  not 
specify  the  timing  relationship  between  two  consecutive  executions  of  the  same  task. 

Simply  assuring  that  one  instance  of  each  task  starts  the  execution  after  the  ready  time  and 
completes  the  execution  before  the  specified  deadline  is  not  enough.  Some  real-time  applications 
have  more  complicated  timing  constraints  for  the  tasks.  For  example,  the  relative  timing  constraints 
may  be  imposed  upon  the  consecutive  executions  of  a  task  in  which  the  scheduling  of  two  consecutive 
executions  of  a  periodic  task  must  be  separated  by  a  minimum  execution  interval.  The  Boeing  777 
Aircraft  Information  Management  System  is  such  an  example  [1].  One  possible  solution  to  the 
scheduling  problem  of  such  applications  is  to  consider  the  instances  of  tasks  rather  than  the  tasks. 
A  task  instance  is  defined  as  one  execution  of  a  task  within  a  period.  With  the  notion  of  task 
instances,  one  is  able  to  specify  the  various  liming  constraints  and  dependencies  among  instances 
of  tasks. 

In  this  paper,  we  consider  the  relative  timing  constraints  imposed  on  two  consecutive  instances 
of  a  task.  The  task  model  and  the  analysis  of  the  timing  constraints  are  introduced  in  Sections  2 
and  3  respectively.  Based  on  the  analysis,  we  are  able  to  derive  the  scheduling  wdndow  for  each 
task  instance.  Given  the  scheduling  window  of  a  task  instance,  we  present  the  time-based  approach 
of  scheduling  a  task  instance  in  Section  4.  We  propose  three  priority  assignment  algorithms  for  the 
task  instances  in  Section  5.  The  task  instances  are  scheduled  one  by  one  based  on  their  priorities. 
In  Section  6,  we  evaluate  the  three  algorithms  and  show  the  experimental  results. 
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2  Problem  Statement 


Consider  a  set  of  periodic  tasks  F  =  {  t,-  1  i  =  1,  . . .  n  },  where  t,-  is  a  4-tuple  <  p,-,  e,-.  A,',  p,-  > 
denoting  the  period,  computation  time,  low  jitter  and  high  jitter  respectively.  One  instance  of  a 
task  is  executed  each  period.  The  execution  of  a  task  instance  is  non-preemptable.  The  start  times 
of  two  consecutive  instances  of  task  t,-  are  at  least  p,-  —  A,-  and  at  most  p,-  -f  p;  apart. 

In  order  to  schedule  periodic  tasks,  we  consider  the  least  common  multiple  (LCM)  of  all  periods 
of  tasks.  Let  n,-  be  the  number  of  instances  for  task  r,-  within  a  schedule  of  length  LCM.  Hence,  n; 
=  .  A  schedule  for  a  set  of  tasks  is  the  mapping  of  each  task  t,-  to  n,-  task  instances  and  the 


assigning  of  a  start  time  to  the  j-th  instance  of  task  r,-,  vf ,  V  t  =  1, . . .  n  and  j  =  1,  . . 
feasible  schedule  is  a  schedule  in  which  the  following  conditions  are  satisfied  for  each  task 

.,  Tli.  A 

Ti'. 

//  = 

Si  +  e; 

(1) 

n,H-l  _ 

s]  -1-  LCM 

(2) 

IV 

Si  ^  4-  Pi  —  A,- 

(3) 

4  < 

^i”^  Pi  + 

(4) 

Vj  =  2,...,ni-i-  1. 

The  non-preemption  scheduling  discipline  leads  to  Equation  1  where  //  is  the  finish  time  of  r/ . 
Another  condition  for  non-preemption  scheduling  is  that  given  any  i,  j,  k  and-^,if  then  // 

<  s^.  It  means  the  schedule  for  any  two  instances  is  non-overlapping.  The  constructed  schedule  of 
length  LCM  is  invoked  repeatedly  by  wrapping-around  the  end  point  of  the  first  schedule  to  the 
start  point  of  the  next  one.  Hence,  as  shown  in  Equation  2,  the  start  lime  of  the  first  instance  in 
the  next  schedule  is  exactly  one  LCM  away  from  that  of  the  first  schedule.  Finally,  Equations  3 
and  4  specify  the  relative  liming  constraints  between  two  consecutive  instances  of  a  task. 


3  Analysis  of  Relative  Timing  Constraints 

Define  the  scheduling  window  for  a  task  instance  as  the  time  interval  during  which  the  task  can 
start.  Traditionally,  the  lower  and  upper  bounds  of  the  scheduling  window  for  a  task  instance  are 
called  earliest  start  time  (est)  and  latest  start  time  {1st)  respectively.  These  values  are  given  and 
independent  of  the  start  times  of  the  preceding  instances. 
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!  Instance  ID 

est  =  +  Pi  -  A,- 

ht  =  -f  p;  +  Vi 

actual  start  time  (s^) 

i  r}  - 

0 

40  1 

4 

1  r/ 

39 

49 

40 

'  rf 

75 

8^ 

77 

i - - 

112 

122 

113 

i  T? 

148 

158 

Table  1;  An  example  to  show  the  wrong  setting  of  scheduling  windows 


We  consider  the  scheduling  of  periodic  tasks  with  relative  timing  constraints  described  in  Equa¬ 
tions  3  and  4.  The  scheduling  window  for  a  task  instance  is  derived  from  the  start  times  of  its 
preceding  instances.  A  feasible  scheduhng  window  for  a  task  instance  rf  is  a  scheduling  window 
in  which  any  start  time  in  the  window  malces  the  timing  relation  between  ^  and  satisfy 
Equations  3  and  4.  Formally,  given  s] ,  s?,  . . and  . . ,  the  problem  is  to  derive  the  feasible 
scheduling  window  for  rf  such  that  a  feasible  schedule  can  be  obtained  if  r/  is  scheduled  within 
the  window. 


For  the  sake  of  simplicity,  we  assume  that  r,-  =  0  and  d:  =  p.,  V  i,  in  this  section.  Then,  simply 

assigning  est  and  hi  of  rf  as  +  pi  -  A,-  and  -r  p-:  -f  tj.  respectively  where  i  =1,2,. . n 

and  j  =  i.  2 . n,  ,  is  not  tight  enough  to  guarantee  a  feasible  solution.  For  example,  consider 

the  case  shown  in  Table  1  in  which  a  periodic  task  r,-  is  to  be  scheduled.  Let  LCM,  pi,  A,y  and 
be  200.  40,  5,  and  5  respectively.  Hence,  there  are  5  instances  wdthin  one  LCM  (i.e.  n,  —  5).  The 
first  column  in  Table  1  indicates  the  instance  IDs.  The  second  and  third  columns  give  the  est  and 
ht  of  the  scheduling  windows  for  the  task  instances  spedfied  in  the  first  column.  The  last  column 
shows  the  actual  start  times  scheduled  for  the  particular  task  instances.  The  actual  start  time  is 
a  \'alue  in  between  est  and  hi  of  each  task  instance.  For  instance,  the  est  and  ht  of  r?  are  39  and 


49  respectively.  It  means  39  <  s-  <  49.  The  scheduled  value  for  s?,  in  the  example,  is  40.  Since 
=  cl  -J-  LCM  =  204.  we  find  that  any  value  in  the  interval  [148,158]  can  not  satisfy  the  relative 
timing  constraints  between  rf  and  rf .  As  a  consequence,  the  constructed  schedule  is  infeasible. 


W'e  draw  a  picture  to  depict  the  relations  among  the  start  times  of  task  instances  in  Figure  1. 
When  rf  is  taken  into  account,  the  scheduling  window  for  is  obtained  by  considering  its  relation 
with  s^~^  as  well  as  that  with  s“’  and  s’"'"''’ .  We  make  sure  that  once  S;  is  determined,  the  estimated 
est  and  hi  of  s”*',  based  on  S;  and  spedfy  a  feasible  scheduling  window  for  sf'.  Namely,  the 

interval- which  is  spedfied  by  the  estimated  est  and  ht  of  sf’ ,  based  on  sf.  overlaps  the  interval 
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Figure  1:  The  relations  between  the  tzisk  instances 

W*’- if))- 

Proposition  1:  Let  tie  est  and  1st  of  be 

est(7f )  =  max{(sr’  +  ?i  -  >^i).  (4  +  (j  -  1)  x  p,  -  (n,-  -  ;  ^  1)  x  77,)},  (5) 

and  2si(Tf )  =  min{(sr’  +  Pi  +  Vi),  (s}  ~  (j  -  1)  x  p,-  (n,  -  j  +  1)  x  A,-)}.  (6) 

If  is  in  between  the  est(7f)  and  lst(Tf),  then  tbe  estimated  est  and  is!  of  sf‘  ,  based  on  sf  and 
specify  a  feasible  window. 

Proof:  I^et  £  and  p  be  the  estimated  est  and  Isi  of  s”‘,  based  on  s^,  respective! j’. 

Hence, 

^  =  s^  +  (n.-  -  j)  X  (p,-  -  A;)  (7) 

p  =  52  -f  (n,-  -  j)  X  (pi  +  77,-)  (8) 

To  guarantee  the  existence  of  feasible  start  time  of  rf’,  the  interval  (£,p]  has  to  overlap  the 
inter\'al  [s,  '  —  (p,-  -j-  77,),  (p,-  —  A,)].  Hence  the  following  conditions  have  to  be  satisfied: 
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(10) 


M  <  P)  +  Vi 

By  replacing  £  in  Equation  9  with  +  (n,-  —  ji')  x  (j>i  —  A,),  we  obtain 

Si  <  -  (ni  -  j  +  1)  X  (pi  -  Ai) 

=  5;  +  LCM  -  (ni  -  j  -f  1)  X  (pi  -  Ai) 

=  s]  +  Uj  X  Pi  -  (Tii  -  ;  +  1)  X  (pi  -  Ai) 

=  s- +  (i  -  1)  X  Pi  +  (ni -j  +  1)  X  Ai  (11) 

Likewise,  by  replacing  p  in  Equation  10  with  s^-  +  (ni  -  j)  x  (pi  +  Pi),  we  have 

Si  >  s”'"^^  -  (ni  -  J  +  1)  X  (pi  -f  Vi) 

=  5i  +  LCM  -  (ni  -  i  +  1)  X  (pi  +  Tji) 

=  sj  +  {j  -  1)  X  Pi  -  (ni  -  i  -f  1)  X  Vi  (12) 

So,  According  to  Equations  12  and  3,  we  choose  the  bigger  value  between  {s^-  ^  +  Pi  —  Ai)  and 
(5I  4.  (j  _  1)  X  Pi  -  (ni  -  j  -i- 1)  X  Vi)  as  the  esioi  Tj .  Similarly,  according  to  Equations  11  and  4, 
we  assign  the  smaller  value  of  -f  pi  t  pi)  and  (s-  -f  {j  —  1)  x  pi  -r  (nv  —  j  -r  1)  x  Ai)  as  the 

1st. 

D 

Example  3.1:  To  show  how  Proposition  3  gives  a  tighter  bound  to  find  feasible  scheduling  windows, 
we  consider  the  case  shown  in  Table  1  again.  We  apply  Equations  5  and  6  to  compute  the  est  and 
1st  of  each  instance.  The  results  are  shown  in  Table  2.  Note  that  the  scheduling  windows  for  r* 
and  r?  are  tighter  than  those  in  Table  1.  As  a  consequence,  any  start  time  in  the  interval  [159,160] 
for  r?  satisfys  the  relative  timing  constraints  between  rf  and  rf. 

3.1  Property  of  Scheduling  Windows 

Define  Pi(i,y,  z)  as  the  predicate  in  which  the  estimated  est  and  1st  of  r/',  based  on  sf  and  sf, 
specify  a  feasible  scheduling  window  for  rf .  In  Proposition  3  ,  we  prove  that  for  any  s^  in  between 
est{rf)  and  lsi{rj)  as  specified  in  Equations  5  and  6,  P,-(j.n, -,71,  -f  1)  is  true. 
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Instance  ID 

e$i  from  Equation  5 

1st  from  Equation  6 

actual  start  time  (s^ ) 

0 

40 

4 

*  \ 

39 

49 

40 

't 

75 

85 

77 

114 

122 

li5 

rt 

159 

160 

159  ~  160 

Table  2:  The  correct  setting  of  scheduling  windows  based  on  Proposition  3.1. 


Lemma  1  Given  s} ,  sf,  . . and  s^,  if,  )/  k  =  2,  . . j,  est(T*j  <  sf  <  lst(V/'y  as  specified  in 
Equations  5  and  6,  then  P{{j,  y,  n,-  +  1)  w  true,  V  y  =  j  +  1,  j  +  2,  . . .,  n,-. 

Proof;  We  prove  that  the  estimated  est  and  1st  of  r?',  based  on  sj  and  ,  Specify  a  feasible 
scheduling  window,  by  showing  that  (1)  the  estimated  scheduling  window  of  s?,  based  on  s^,  is 
specified  by  the  interval 

l^i  -r  (y  -  j)  X  (Pi  -  Ai), -'i  T  (y  -  i)  X  (Pi  +  V-:)},  (13) 

(2)  the  estimated  scheduling  window  of  sf,  based  on  is  specified  by  the  interval 

-  (n,-  -  y  -r  1)  X  (p,-  -f-  77,),  -  (n.-  -  y  +  1)  X  (p,-  -  A,)],  (14) 

and  (3)  the  intervals  in  Equations  13  and  14  overlap. 

In  Figure  2,  we  see  that  the  necessary  and  suffident  conditions  for  the  overlapping  of  the 
intervals  spedfied  in  Equations  13  and  14  are 

+  (y  -  j)  X  (Pi  -  -  (n.-  -  y  -f  1)  X  (pi  -  A,)  (15) 

and  s"’”^’  -  (n,-  -  y  -f  1)  X  (pi  4  77,)  <  +  (y  -  J)  x  (p.-  4  77,-).  (16) 

By  solving  the  Equations  15  ^I^d  16,  we  obtain 

Si  <  4  +  (j-l).xpi4(7t.-i4  1)xAi 

and  s^  >  4  (i- 1)  X  Pi  -  (tz; -i4  1)  X  77... 

The  above  two  equations  describe  the  same  conditions  as  Equations  11  and  12  do.  Hence,  P,-(i,  y,  ti,-  4  1) 
is  true,  V  y  =  j  4  1,  i  4  2,  . . .,  ti,-. 
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+  (y  -  j)  X  (.Pi  -  4  +  (y  -  i)  X  (pi  +  T?,-) 


5”’''’^  —  (n,-  —  y  +  1)  X  (pi  +  Pi)  —  (n,-  —  y  +  1)  x  (*),•  —  A,-)  / 

I _ I  . ' 

Figure  2;  The  overlapping  of  two  intervals 


□ 

Lemma  2  Given  5],  5?,  . . s^-,  and  an  integer  uq,  where  1  <  no  <  j ,  if,  k  =  2,  . . j,  estfr-^J 
<  <  ist(rf)  are  specified  as  in  Equations  5  and  6,  then  Pi(j,y,n{  -{■  tiq)  is  true,  V  y  =  j  +  1, 

j-i-2,  Ui. 

Proof:  We  use  the  same  method  in  Lemma  1  to  prove  it.  We  show  that  (1)  the  estimated  scheduling 
w'indow  of  s^,  based  on  is  specified  by  the  interval 

14  +  (y  -  i)  X  (pi  -  A,),  4  +  (y  -  j)  X  (p;  +  Pi)],  (17) 

(2)  the  estimated  scheduling  window  of  s^,  based  on  s”'””®,  is  specified  by  the  interval 

_  (71,.  i  Tip  _  y)  X  {pi  +  77,),  -  (m  +  no  -  y)  X  (pi  -  A,)],  (IS) 

and  (3)  these  two  intervals  overlap. 

The  following  conditions  have  to  be  satisfied  to  make  sure  the  overlapping  of  the  two  intervals. 

4  ^  5^®  +  (i  -  1)  X  p,  + (n.- -  i -i- 1)  X  A,-- (p,- -  A)  X  no- 1  (19) 

and  4  ^  57®  -r  (j  -  1)  X  Pi  -  (n;  -  j  +  1)  x  pi  -  (p,  +  pi)  x  no  -  1.  (20) 

Since  s]  <  -  (pi- ^)  x  (no-  1)  and  s]  >  sf®  -  (pi-r  r?,)  x  (no- 1),  we  rewrite  Equations  19 

and  20 

4  ^  +  (i  -  1)  X  Pi  +  (n,-  -  J  4- 1)  X  A,— (p.-  -  A)  X  no  -  1 
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<  £[  +  (i  -  1)  X  Pi  +  (n,-  -  i  +  1)  X  A,- 
and  s\  >  +  (j  -  1)  X  p,-  -  (ti,-  -  j  +  1)  x  t?,— (p,-  +  ni)  x  np  -  1. 

>  +  (i  -  1)  X  Pi  -  {Ui  -  j  +  1)  X  T}i 

Hence  P,-(j,p,  n,-  +  no)  holds  for  any  1  <  no  <  ;. 


O 


Theorem  1  Given  s] ,  s],  . . and  s{,  if,  V  k  =  2,  . . j,  est(V;^j  <  <  Istfr/'j  as  specified  in 

Equations  5  and  6,  then  P,-(j,p,  z)  is  true,  V  p  =  j  +  1,  j  +  2,  . . n;,  and  r  =  n,-  +  1,  n,-  +  2,  . . 
rii-rj. 

By  combining  the  proofs  in  Lemmas  1  and  2,  it  is  easy  to  see  that  Theorem  1  holds.  Based  on 

Theorem  1  ,  we  can  assign  the  scheduling  window  for  rf  by  using  Equations  5  and  6  once  sj ,  , 

Before  we  present  the  scheduling  technique  for  a  task  instance,  let  us  consider  the  following 
objective.  The  objective  can  be  formulated  as  follows.  Given  a  set  of  tasks  with  the  characteristics 
described  in  Section  2,  we  schedule  the  task  instances  for  each  task  within  one  LCM  to  minimize 

"  =  XJ  “(‘'i  ~  ■  P’) 

Subject  to  the  constraints  specified  in  Equations  1  through  4, 

where  a(a)  =  z,  if  i  >  0;  =  -x,  otherwise. 

Basically,  we  try  to  schedule  every  instance  of  a  task  one  period  apart  from  its  preceding 
instance.  An  optimal  schedule  is  a  feasible  schedule  with  the  minimum  total  deviation  value  from 
one  period  apart  for  instances. 


4  The  Time-Based  Scheduling  of  a  Task  Instance 

We  consider  the  time-based  solution  to  the  scheduling  problem  by  using  a  linked  list.  Each  element 
in  the  list  represents  a  time  slot  assigned  to  a  task  instance.  A  time  slot  w  has  the  following  fields: 
(1)  task  id  i  and  instance  idj  indicate  the  identifier  of  the  time  slot.  (2)  start  time  si  and  finish  time 

ft  indicate  the  start  time  and  completion  time  of  rf  respectively.  (3)  prev  ptr  and  next  pir  are  the 
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Figure  3:  Insertion  of  a  new  time  slot 


pointers  to  the  preceding  and  succeeding  time  slots  respective!}’.  We  arrange  the  time  slots  in  the 
list  in  increasing  order  by  using  the  start  time  as  the  key.  Any  two  time  slots  are  non-overlapping. 
Since  the  execution  of  an  instance  is  non-preemptable,  the  time  difference  between  start  time  and 
finish  time  equals  the  execution  time  of  the  task. 

4.1  Creating  a  Time  Slot  for  the  Task  Instance 

Consider  a  set  of  n  tasks.  Given  a  linked  list  and  a  task  instance  r/,  we  schedule  the  instance  by 
inserting  a  time  slot  to  the  list.  According  to  equations  5  and  6,  we  compute  the  esl(7f )  and  lst{r^) 
first.  Let  S  be  the  set  of  unoccupied  time  intervals  that  overlap  the  interval  [est(7f ),  ist(7f )]  in  the 
linked  list.  The  unoccupied  time  intervals  in  5  are  collected  by  going  through  the  list.  Each  time 
when  a  pair  of  time  slots  (m,  1)  is  examined,  we  compute  £  =  max{est(7f),  ft{w)}  and  fi  = 

iiiin{ist(Tf ),  st{w  1)),  where  fi{w)  is  the  finish  time  of  the  time  slot  w,  and  st{w  -f  1)  is  the  start 
time  of  the  slot  next  to  tc.  H  t  <  fi.  then  we  add  the  interval  [£,  //]  to  S. 

The  free  intervals  in  5  are  the  potential  time  slots  which  rf  can  be  assigned  to.  Since  we  try 
to  schedule  rj  as  close  to  one  period  away  from  the  preceding  instance  as  possible,  we  sort  5, 

based  on  the  function  of  the  lower  bound  of  each  interval,  0(5^”^  +  p,-  —  .f),  in  ascending  order. 
W'ithout  loss  of  generality,  we  assume  that  5  after  the  sorting  is  denoted  by  {ini-^ .  ini^. . . . ,  intjs|} 
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The  idea  is  that  if  rj  is  scheduled  to  then  the  \^ue  in  equation  21  will  be  smaller  than  that 
of  the  case  in  which  rj  is  scheduled  to 

The  scheduling  of  rj  can  be  described  as  follows.  Starting  from  iniy^  we  check  whether  the 

length  of  the  interv’al  is  greater  or  equal  to  the  execution  time  of  r/  or  not.  If  yes,  then  we  schedule 

the  instance  to  the  interval.  One  new  time  slot  is  created  in  which  the  start  time  is  the  lower  bound 
of  the  interval  and  the  finish  time  equals  the  start  time  plus  the  execution  time.  The  created  time 
slot  is  added  to  the  linked  list  and  the  scheduling  is  done.  If  the  length  is  smaller  than  the  execution 
time,  then  we  check  the  length  of  the  next  interval  until  all  intervals  are  examined.  An  example  is 

shown  in  Figure  3  in  which  the  slot  with  dark  area  represents  rj .  In  this  example  we  assume  that 

)  <  /i  and  ^2  /i  >  c.  It  means  the  free  slot  between  the  first  ar  d  second  occupied  slots 
can  be  assigned  to  r/. 


4.2  Sliding  of  the  Time  Slots 


In  case  none  of  the  intervals  in  S  can  accommodate  a  iask  the  sliding  technioue  is  used 

to  create  a  big  enough  interval  by  sliding  the  existence  time  slots  in  the  list. 

To  make  the  sliding  technique  work,  we  maintain  two  values  for  each  time  slot;  left  laxity  and 
right  laxity.  The  value  of  left  laxity  indicates  the  amount  of  time  units  by  wiuch  a  time  slot  can  be 
left- shifted  to  a  earlier  start  time.  Similar! 3'^  the  rigWi  lexiiy  the  amount  of  time  units  bv 

which  a  time  slot  can  be  right-shifted  to  a  later  start  time. 

Given  the  time  slots  and  where  a  and  b  are  the  task  and  instance  identifiers  of 

Wk  respectively,  the  laxity  values  of  the  time  slot  Wk  can  be  computed  by: 


le  ftJaxiiy(w)^) 
right  Jaxity{wk) 
where 
and 


min{sl  ~  esi\  5^  —  -f  l€ftJaxity{wk^i)) 

min{lst^  -  s^,  st{wk^i)  -  right Jaxity{v;i:^i)] 
esi'  =  max{est{T^),  -  (.p„  +  tj^)} 

1st'  =  min{lst{rl),  s^'^  -  (p^  -  Aa)}. 


(22) 

(23; 


Note  that  the  interveJ  [eit ,  Isi']  defines  the  sliding  range  during  which  car.  start  without 
'o  '0  •  schematic  illustration  of  equations  22  and  23  is  given  in  Figure  4. 

From  equations  22  and  23,  we  see  that  the  computing  of  lt}t^l(ixiiy{wk)  depends  on  that  of  U/;,-! 
and  the  computing  of  righiJaxiiy['wi;)  depends  on  that  of  Wk^y .  It  implies  a  two-pass  computation 
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est'  1st' 


Figure  4:  An  illustration  of  lefiJaxiiy{v}k)  and  righiJaxityiw}-) 

is  needed  to  compute  the  laxity  values  for  all  time  slots.  The  complexity  is  0{2N)  where  A  is  the 
number  of  time  slots  in  the  linked  list. 

The  basic  idea  of  the  sliding  teclinique  is  described  as  follows.  Given  a  task  instance  rf  and  a 
set  of  unoccupied  intervals,  S  =  {ini^,  int,5|),  we  check  one  interval  at  a  time  to  see  if 

the  interval  can  be  enlarged  by  shifting  the  existent  time  slots.  Two  possible  ways  of  enlargement 
are  (3)  by  either  shifting  the  time  slots,  that  precede  the  interval,  to  the  left  or  (2)  shifting  the 
slots,  that  follow  the  interval,  to  the  right.  The  shifting  depends  on  which  direction  minimizes  the 
objective  function  in  Equation  21. 

4.3  The  Algorithm 

An  algorithmic  description  about  bow  to  schedule  a  task  instance,  as  described  in  Sections  4.1 
and  4.2,  is  given  in  Table  3. 

The  procedures  Left^hift(tni,time.units)  and  Rigbt.Shift(tCi,time.un3ts)  in  Table  3  may  involve 
the  shifting  of  more  than  one  time  slot  recursively.  For  example,  consider  the  case  in  Figure  4,  if 
Right.Shift(tnit,ist'  -  s^)  is  invoked  (i.e.  Wk  is  to  be  shifted  right  by  1st'  -  s^  time  units),  then 
TUfc+j  has  to  be  shifted  too.  It  is  because  the  gap  between  Wk  and  tojt+j  is  st{wk+\)  —  fl  which  is 
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smaller  than  Isf  -  In  this  case,  Right-Shift(injt+i,/s^'  “  4  -  +  fa)  is  invoked. 

We  do  not  enlarge  an  interval  at  both  ends.  Enlarging  an  interval  at  both  ends  needs  to  shift 
certain  amount  of  preceding  time  slots  to  the  left  and  shift  some  succeeding  slots  to  the  right.  It  is 
possible  that  some  task  instance  is  shifted  left,  while  is  shift<^  right.  As  a  consequence,  the 
timing  constraints  between  and  could  be  violated.  For  example,  Let  si  and  before  the 
shifting  be  10  and  20  respectively.  The  execution  time  for  is  5  time  units.  Assume  the  left  laxity 
of  rj'  is  5  and  the  right  laxity  of  is  5.  It  implies  sl"^^  —  <  15.  Consider  the  scheduling  of  a 

task  instance  rj  with  execution  time  15.  If  we  enlarge  the  interval  between  fj'  and  by  shifting 
left  5  time  units  and  right  5  time  units,  then  we  get  a  new  interval  with  15  time  units  for 
rj.  However,  it  turns  out  that  sl"^^  =  25,  si  =  5,  and  the  relative  timing  constraints  between 
and  is  violated. 

5  The  Priority-Based  Scheduling  of  a  Task  Set 

We  consider  the  priority-baised  algorithms  for  scheduling  a  set  of  periodic  tasks  with  hybrid  timing 

constraints.  Given  a  set  of  periodic  tasks  T  =  {  r,-  |  z  =  1, _ ,  n  }  with  the  task  characteristics 

described  in  Section  2,  we  compute  the  LCM  of  all  periods.  Each  task  r,-  is  extended  to  n,-  task 
instances:  r/ ,  r?,  . . A  scheduling  algorithm  a  for  T  is  to  totally  order  the  instances  of  all 
tasks  within  the  LCM.  Namely,  a  :  tasked  x  instance^id  —  integer. 

Three  algorithms  are  considered.  They  are  smallest  latesUstari^lifne yirst'SksF),  smallest  period 
first  (SPF),  and  smallest  jitter  first  (SJF)  algorithms. 

5.1  SLsF 

The  scheduling  window  for  a  task  instance  rj  depends  on  the  scheduling  of  its  preceding  instance. 
Once  is  determined,  the  scheduling  window  of  the  instance  can  be  computed  by  equations  5 
and  6.  The  scheduling  window  for  the  ftot  instance  of  a  task  r,-  is  defined  as  [r, -  e,-]. 

The  idea  of  the  SLsF  algorithm  is  to  pick  one  candidate  instance  with  the  minimum  1st  among 
all  tasks  at  a  time.  One  counter  for  each  task  is  maintained  to  indicate  the  candidate  instance.  All 
counters  are  initialized  to  1.  Each  time  when  a  task  instance  with  the  smallest  1st  is  chosen,  the 
algorithm  in  Table  3  is  invoked  to  schedule  the  instance.  After  the  scheduling  of  the  instajice  is 
done,  the  counter  is  increased  by  one.  The  counter  for  r,*  overflows  when  it  reaches  n, -f  1,  It  means 
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that  aJl  the  instances  of  t,-  are  scheduled.  The  algorithm  terminates  when  all  counters  overflow. 

We  can  compute  the  relative  deadline  for  a  task  instance  by  adding  the  execution  time  to  the 
1st.  If  the  execution  times  for  all  tasks  axe  identical,  the  SLsF  algorithm  is  equivalent  to  the  earliest 
deadline  first  (EDF)  algorithm. 

5.2  SPF 

The  task  periods  determine  the  LCM  of  F  and  the  numbers  of  instances  for  tasks  within  the  LCM. 
In  the  most  cases,  the  task  with  the  smaller  period  has  the  tighter  timing  constraints.  Namely, 
(A,-  +  rji)  <  (A^  +  rpj)  if  p,-  <  pj.  To  make  the  tasks  with  the  smaller  periods  meet  their  timing 
constraints,  the  SPF  algorithm  favors  the  tasks  with  smaller  periods. 

The  SPF  algorithm  uses  the  period  as  the  key  to  arrange  all  tasks  in  non-decreasing  order.  The 
task  with  the  smallest  period  is  selected  to  schedule  first.  The  instances  of  a  particular  task  are 
scheduled  one  by  one  by  invoking  the  algorithm  in  Table  3.  After  all  the  instances  of  a  task  are 
scheduled,  the  next  task  in  the  sequence  is  scheduled. 


5.3  SJF 

We  define  the  jitter  of  a  task  r,-  as  (A,-  -r  p,).  It  is  proportional  to  the  range  of  the  scheduling 
window.  Hence,  The  schedulability  of  a  task  also  depends  on  the  jitter. 

Instead  of  using  the  period  as  the  measurement,  the  SJF  algorithm  assigns  the  higher  priority 
to  the  tasks  with  the  smaller  jitters.  The  task  with  the  smallest  jitter  is  scheduled  first. 

5.4  The  Solution 

The  composition  of  the  time-based  scheduling  of  a  task  instance  and  the  priority  assignment  of 
task  instances  is  shown  in  Figure  5.  The  prioritj-  assignment  can  be  done  by  using  SLsF,  SPF,  or 
SJF.  The  function  Schedul€-An.Jnstance()  is  invoked  to  schedule  a  single  task  instance. 


6  Experimental  Evaluation 

We  conduct  two  experiments  to  study  and  compare  the  performance  of  the  three  algorithms.  The 
purpose  of  the  first  experiment  is  to  study  the  effect  of  the  number  of  tasks  and  utilization  on 
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the  schedulability  of  each  algorithm.  The  objective  of  the  second  experiment  is  to  compare  the 
performance  of  the  three  algorithms. 


6.1  The  First  Experiment 

The  task  generation  scheme  for  the  first  experiment  is  characterized  by  the  following  parameters. 

•  Periods  of  the  tasks;  We  consider  a  homogeneous  system  in  which  the  period  of  one  task 
could  be  either  the  same  as  or  multiple  of  the  period  of  another.  We  consider  a  system  with 
40,  80,  160,  320,  and  640  as  the  candidate  periods.  There  may  be  more  than  one  task  with 
the  same  period. 

•  The  execution  time  of  a  task,  e,-  :  It  has  the  uniform  distribution  over  the  range  lO,fg],  where 
Pi  is  the  period  of  the  task  r,-.  The  execution  time  could  be  a  real  value. 

•  The  jitters  of  a  task:  A,-  =  17,-  =  0.1  x  p,-. 


We  define  the  utilization  of  a  task  system  as 


A’ 


(24) 


In  the  first  experiment,  the  utilization  value  and  the  number  of  tasks  in  a  set  are  the  controDed 
variables.  Given  an  utilization  value  U  and  the  number  of  tasks  the  scheme  first  generates  a 
run  of  raw  data  by  randomly  generating  a  set  of  tasks  based  on  the  the  selected  periods,  jitter 
values,  and  the  execution  time  distribution.  The  utilization  of  the  raw  data,  u,  is  then  computed  by 
Equation  24.  Finally,  the  utilization  value  of  the  raw  data  is  scaled  up  or  down  to  U  by  multiplying 
^  to  the  execution  time  of  each  generated  task.  As  a  consequence,  we  obtain  a  set  of  tasks  with 
the  specified  {U,N)  value. 

For  each  combination  of  {U,N)  in  which  U  =  5%,  10%,  15%,  ...  100%  and  N  =  10,  20,  and 
30.  we  apply  the  scheme  to  generate  5000  cases  of  input  data  and  use  the  three  algorithms  to 
solve  them.  The  scheduiability  degree  of  each  combination  for  an  algorithm  is  obtained  by 

dividing  the  number  of  solved  cases  by  5000.  Since  the  jitter  values  is  1/10  of  periods,  it  is  observed 
that  the  SPF  and  SJF  algorithms  yield  the  same  results.  The  results  are  shown  in  Figure  6. 

As  can  be  seen  in  Figures  6(a)  and  (b)  the  number  of  tasks  has  the  different  effects  on  the 
three  algorithms.  For  SLsF,  given  a  fixed  utilization  value,  the  schedulabilitt'  degree  increases 
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Figure  6:  The  effect  of  the  numbers  of  tasks  on  the  schedulability 

^  the  of  taskt  it.  a  sj-ttem  hecomet  Kggei.  It  is  be^ose  the  exeeetion  time  of  e  tesk 

becotnes  smeUe,  as  the  anrober  of  tasks  iacreases.  Fot  a  task  system  with  smaDet  eaecauon  ttme 
distiibotion.  the  chaace  for  SLsF  to  fiad  a  feasible  solufem  is  «gger^  The  same  pVaomeaoa 
also  foaad  ia  Figaie  6(b)  for  SPF  aad  SJF  la  the  loe-etiliaatioa  cases  (..e.  V  <  20%).  Howeve,, 
for  the  high-atiliaatioa  cases  ia  Figare  6(b),  the  complerdty  of  the  amaber  of  tasks  domtaates  the 
algorithms  and  the  schedulability  decreases. 


6.2  The  Second  Experiment 


The  task  generation 


ion  scheme  for  the  second  experiment  is  characterized  by  the  following  parameters. 


.  LCM  =  300 

•  The  nuinber  of  tasks  is  20. 

.  Periods  of  the  tasks.  We  coasider  the  factors  of  the  LCM  as  the  periods.  They  are  20,  30 


50,  60, 100, 150,  and  300.  There  may  be  more 


than  one  task  w'ith  the  same  period. 


The  execution  time  of  a  task,  e,-  :  It  has  the  uniform  distribution  over  the  range  where 

p,-  is  the  period  of  the  task  r,-.  The  execution  time  could  be  a  real  value. 

The- jitters  of  a  task:  A,-  =  77, •  =  0.1  x  p,-  +  2  x  e,-. 


The  generation  scheme  for  the  second  experiment  is  similar  to  the  first  one.  Given  an  utilization 
value  [/,  a  set  of  20  tasks  is  randomly  generated  according  to  the  parameters  listed  above  and  then 
the  execution  time  of  each  task  is  normalized  in  order  to  make  the  utilization  value  equal  to  U 
exactly. 

We  generate  5000  cases  of  different  task  sets  for  each  utilization  value  ranging  from  0.05  to  1.00. 
The  schedulability  degree  of  each  algorithm  on  a  particular  utilization  value  is  obtained  by  dividing 
the  number  of  solved  cases  by  5000.  We  compare  the  schedulability  degrees  of  the  algorithms  on 
different  utilization  values.  The  resrdts  are  shown  in  Figure  7(a). 


As  can  be  see  in  Figure  7(a)  the  SLsF  algorithm  outperforms  the  other  two  algorithms.  For 
example,  when  the  utilization  =  50%,  the  schedulability'  degree  of  SLsF  is  0.575  while  those  of  SPF 
and  SJF  are  less  than  0.2.  It  is  because  the  way  of  assigning  the  priorities  to  the  task  instances  in 
the  SLsF  algorithm  reflects  the  urgency  of  task  instances  by  considering  the  latest  start  times. 

We  also  compare  the  objective  function  value  t  in  Equation  21  among  the  three  algorithms. 
W’e  define  the  normalized  objective  function  for  an  algorithm  as 


sooo 


(25) 


where 


1 

0 

T7icr(»)— min(j) 


if  the  algorithm  can  not  find  a  feasible  solution  lo  case  i. 

if  max{i)  = 

otherwise. 


Given  case  :•  the  vaines  of  min{i)  and  max{i)  are  calculated  among  the  objective  values  obtained 
from  the  algorithms  which  solve  the  case.  For  the  algorithms  which  can  not  find  a  feasible  solution 
to  case  i,  the  objective  values  axe  not  taien  into  account  when  min{i)  and  maz(i)  axe  calculated. 
The  results  of  the  normalized  objective  functions  for  each  algorithm  on  different  utilization  values 
are  shown  in  Figure  7(b). 

It  is  observed  that  in  the  low-utilization  cases  SJF  finds  feasible  solutions  with  smaller  objective 
values.  It  is  because  that  SJF  schedules  the  tasks  with  the  smallest  jitters  first.  By  scheduling 
the  tasks  with  smaller  jitter  value  first  it  is  more  easier  to  make  the  instances  of  a  task  one  period 
apart,  we  can  find  a  feasible  solution  with  smaller  objective  value.  However,  in  the  middle-  or 
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Figure  7:  The  comparison  of  three  algorithms 


liigh-utilization  cases,  the  schedulability  dominates  the  normalized  objective  function,  and  SLsF 
outperforms  the  other  two  algorithms  in  these  regions. 


7  Summary 

In  this  paper  we  have  considered  the  static  non-preemptive  scheduling  algorithm  on  a  single  proces¬ 
sor  for  a  set  of  periodic  tasks  with  hybrid  timing  constraints.  The  time-based  scheduling  algorithm 
is  used  to  schedule  a  task  instance  once  the  scheduling  window  of  the  instance  is  given.  We  also  have 
presented  three  priority  assignment  algorithms  for  the  task  instances  and  conducted  experiments 
to  compare  the  performance.  From  the  experimental  results,  we  see  that  the  SLsF  outperforms  the 
other  two  algorithms. 

The  techniques  presented  in  this  chapter  can  be  applied  to  multi-processor  real-time  systems. 
Communication  and  synchronization  constraints  can  be  also  incorporated.  In  our  future  work,  the 
extension  to  a  distributed  computing  systems  will  be  investigated. 
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-  V  V  1-  •  '  ^  sequence  of  sorted  free  intervals  9  —  /  ini 

in  which  each  interval  overlaps  [65^(7^), i5t(rJ)j  ’  '  *^^2,  . 

Let  the  execution  time  of  r/  be  e. 

For  n  =  1  to  |5|  do 

Let  int„  be  [C,ix]. 

If  p  -  ^  >  e  ;hen 

End  '=’»»  =  ;  + 

End  for. 
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ll  j  M  -  £  >  e  then  /*  Ilight  shift  «/ 

rughL^Sinil(i:;;,.e  —  ^  j. 

End  °''''  ‘  J5nis).  =  c ^  j. 

End  K 

Else  /  Try  right  shift  first  then  left  shift  *  / 

Let  tne  time  slot  that  immediately  follows  int„  be  w, 

II  r:ahtJcrity(u;*)  n-£>,  /.  -/ 

El,a  =  d  and  finish  lim.  =  d  +  », 

"‘T  immediatdy  precadas  ini,  ba  w,. 

If  Ia/!Jc-.lj(n,,)  a-  _  d  >  5  ,j,5„  f.  shift  •  / 

Lefi^hift(u;;.,e  -  p  ^  ' 

End  lima  = 

End  If. 

End  E. 

End  for. 

Schedule  rf  at  the  end  of  linked  list. 
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Abstract 

High-speed  networks,  such  as  ATM  networks,  are  expected  to  support  diverse  quaJitv-of- 
service  (QoS)  requirements,  including  real-time  QoS.  Real-time  QoS  is  required  by  many  appli¬ 
cations  such  as  voice  and  video.  To  support  such  service,  routing  protocols  based  on  the  Virtual 
Circuit  (VC)  model  have  been  proposed.  However,  these  protocols  do  not  scale  well  to  large 
networks  in  terms  of  storage  and  communication  overhead. 

In  this  paper,  we  present  a  scaJable  VC  routing  protocol.  It  is  based  on  the  recently  proposed 
viewserver  hierarchy,  where  each  viewserver  maintains  a  partial  view  of  the  network.  By  querying 
these  viewservers,  a  source  can  obtain  a  merged  view  that  contains  a  path  to  the  destination. 
The  source  then  sends  a  request  packet  over  this  path  to  setup  a  real-time  VC  through  resource 
reservations.  The  request  is  blocked  if  the  setup  fails.  We  compare  our  protocol  to  a  simple 
approach  using  simulation.  Under  this  simple  approach,  a  source  maintains  a  full  view  of  the 
network.  In  addition  to  the  savings  in  storage,  our  results  indicate  that  our  protocol  performs 
close  to  or  better  than  the  simple  approach  in  terms  of  VC  carried  load  and  blocking  probability 
over  a  wide  range  of  real-time  workload. 
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tecture  and  Design  packet  networks^  store  and  forward  networks^  C.2.2  [Computer-Communication  Net¬ 
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Routing  Protocols]. 
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1  Introduction 


Integrated  services  packet-switched  networks,  such  as  Asynchronous  Transfer  Mode  (ATM)  net¬ 
works  [21],  are  expected  to  carry  a  wide  variety  of  applications  with  heterogeneous  quality  of  ser¬ 
vice  (QoS)  requirements.  For  this  purpose,  new  resource  allocation  algorithms  and  protocols  have 
been  proposed,  including  link  scheduling,  admission  control,  and  routing.  Link  scheduling  defines 
how  the  link  bandwidth  is  allocated  among  the  different  services.  Admission  control  defines  the 
criteria  the  network  uses  to  decide  w’hether  to  accept  or  reject  a  new  incoming  application.  Routing 
concerns  the  selection  of  routes  to  be  taken  by  application  packets  (or  cells)  to  reach  their  desti¬ 
nation.  In  this  paper,  we  are  mainly  concerned  with  routing  for  real-time  applications  (e.g.,  voice, 
video)  requiring  QoS  guarantees  (e.g.,  bandwidth  and  delay  guarantees). 

To  provide  real-time  QoS  support,  a  number  of  virtuaUcircuit  (VC)  routing  approaches  have 
been  proposed.  A  simple  (or  straightforward)  approach  to  VC  routing  is  the  link-state  full-view 
approach.  Here,  each  end-system  maintains  a  view  of  the  whole  network,  i.e.  a  graph  with  a  vertex 
for  every  node^  and  an  edge  between  two  neighbor  nodes.  QoS  informaton  such  as  delay,  band¬ 
width,  and  loss  rate  are  attached  to  the  vertices  and  the  edges  of  the  view.  This  QoS  information 
is  flooded  regularly  to  all  end-systems  to  update  their  views.  When  a  new  application  requests  ser¬ 
vice  from  the  network,  the  source  end-system  uses  its  current  view  to  select  a  source  route  to  the 
destination  end-system  that  is  likely  to  support  the  application’s  requested  QoS,  i.e.,  a  sequence  of 
node  ids  starting  from  the  source  end-system  and  ending  with  the  destination  end-system.  A  VC- 
setup  message  is  then  sent  over  the  selected  source  route  to  try  to  reserve  the  necessary  resources 
(bandwidth,  buffer  space,  service  priority)  and  establish  a  VC. 

Typically,  at  every  node  the  VC-setup  message  visits,  a  set  of  admission  control  tests  are 
performed  to  dedde  whether  the  new  VC,  if  established,  can  be  guaranteed  its  requested  QoS 
w’ithout  violating  the  QoS  guaranteed  to  already  established  VCs.  At  any  node,  if  these  admission 
tests  are  passed,  then  resources  are  reserved  and  the  VC-setup  message  is  forwarded  to  the  next 
node.  On  the  other  hand,  if  the  admission  tests  fail,  a  VC-rejected  message  is  sent  back  towards 
the  source  node  releasing  resource  reservations  made  by  the  VC-setup  message,  and  the  application 
request  is  dther  blocked  or  another  source  route  is  selected  and  tried.  H  the  final  admission  tests 
at  the  destination  node  are  passed,  then  a  VC-established  message  is  sent  back  towards  the  source 
node  confirming  resource  reservations  made  during  the  forward  trip  of  the  VC-setup  message.  Upon 
recaving  the  VC-established  message,  the  appBcation  can  start  transmitting  its  packets  over  its 

’  We  refer  to  switches  and  end-systems  coDectively  as  nodes. 
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reserved  VC.  This  VC  is  torn  down  and  resources  are  released  at  the  end  of  the  transmission. 

ClecLrly,  the  above  simple  routing  scheme  does  not  scale  up  to  large  networks.  The  storage  at 
each  end-system  and  the  communication  cost  are  proportional  to  N  x  d,  where  N  is  the  number  of 
nodes  and  d  is  the  average  number  of  neighbors  to  a  node. 

A  traditional  solution  to  this  scaling  problem  is  the  area  hierarchy  used  in  routing  protocols 
such  as  the  Open  Shortest  Path  First  (OSPF)  protocol  [18].  The  basic  idea  is  to  aggregate  nodes 
hierarchically  into  areas:  “dose”  nodes  are  aggregated  into  level  1  areas,  “dose”  level  1  areas  are 
aggregated  into  level  2  areas,  and  so  on.  An  end-system  maintains  a  view  that  contains  the  nodes 
in  the  same  level  1  area,  the  level  1  areas  in  the  same  level  2  area,  and  so  on.  Thus  an  end-system 
maintains  a  smaller  view  than  it  would  in  the  absence  of  hierarchy.  Each  area  has  its  own  QoS 
information  derived  from  that  of  the  subareas.  A  major  problem  of  an  area-based  scheme  is  that 
aggregation  results  in  loosing  detailed  link-level  QoS  information.  This  decreases  the  chance  of  the 
routing  algorithm  to  choose  “good”  routes,  i.e.  routes  that  result  in  high  successful  VC  setup  rate 
(or  equivalently  high  carried  VC  load). 

Our  scheme 

In  this  paper,  we  present  a  scalable  VC  routing  scheme  that  does  not  suffer  from  the  problems  of 
areas.  Our  scheme  is  based  on  the  viewserver  hierarchy  we  recently  proposed  in  [3,  2]  for  large 
internetworks  and  evaluated  for  administrative  policy  constraints.  Here,  we  are  concerned  with  the 
support  of  performance/ QoS  requirements  in  large  wide-area  ATM-like  networks,  and  we  adapt  our 
viewserver  protocols  accordingly. 

In  our  scheme,  views  are  not  maintained  by  every  end-system  but  by  special  switches  called 
viewservers.  For  each  viewserver,  there  is  a  subset  of  nodes  around  it,  referred  to  as  the  viewserver’s 
precinct  The  view'server  only  maintains  the  view  of  its  precinct.  This  solves  the  scaling  problem 
for  storage  requirement. 

A  viewserver  can  provide  source  routes  for  VCs  between  source  and  destination  end- systems 
in  its  precinct.  Obtaining  a  route  between  a  source  and  a  destination  that  are  not  in  anj’  single 
view  involves  accumulating  the  views  of  a  sequence  of  viewservers.  To  make  this  process  efncient, 
viewservers  are  organized  hierarchically  in  levels,  and  an  associated  addressing  structure  is  used. 
Each  end-system  has  a  set  of  addresses.  Each  address  is  a  sequence  of  viewserver  ids  of  decreasing 
levels,  starting  at  the  top  level  and  going  towards  the  end-system.  The  idea  is  that  when  the  views 
of  the  viewservers  in  an  address  are  merged,  the  merged  view  contains  routes  to  the  end-system 
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from  the  lop  level  viewservers.  ,  i. 

We  handle  dynamic  topology  changes  such  as  node/link  failures  and  repairs,  an 

changes.  Nodes  detect  topologj  changes  affecting  itself  and  neighbor  nodes.  Each  node  cornmn- 
nicates  these  changes  by  iooding  to  the  viewservers  in  a  specifted  snbset  of  nodes;  tins  subset  ts 
referred  to  as  its  flood  aroa.  Hence,  the  number  of  paciets  used  duiing  ioodmg  is  proportmual  o 
the  sme  of  the  flood  area.  Tins  solves  the  scaling  problem  for  the  communication  reqmrement. 

Thus  our  VC  routing  protocol  consists  of  two  subprotocols:  a  uie»-,uen/  protocol  between  end- 
systems  and  viewservers  for  obtaining  merged  views;  and  a  oie^opdote  protocol  between  nodes  and 

viewservers  for  updating  views. 

Evaluation 

In  this  paper,  we  compare  our  viewserver-based  VC  routing  scheme  to  the  simple  scheme  using 
VC-level  simulation.  In  our  simulation  model,  we  define  netwoih  topologies,  QoS  reqmrem^ts, 
viewserver  hierarchies,  and  evaluation  measures.  To  the  best  of  our  knowledge,  tins  m  the  firs, 
evaluation  of  a  dynamic  hierarchical-based  VC  routing  scheme  under  real-time  workload. 

Our  evaluation  measures  are  the  amount  of  memory  required  at  the  end-systems,  the  amount 
of  time  needed  to  construct  a  path^  the  carried  VC  load,  and  the  VC  blocking  probabmty.  We 
use  network  topologies  each  of  sire  2764  nodes.  Our  results  indicate  that  our  viewserver-based  VC 
routing  scheme  performs  close  to  or  better  than  the  simple  scheme  in  terms  of  VC  earned  loa 
and  blocking  probabffity  over  a  wide  range  of  workload.  It  also  reduces  the  amount  of  memory 

requirement  by  up  to  two  order  of  magnitude. 

Organization  of  the  paper 

In  Section  2.  we  survey  recent  approaches  to  VC  routing.  In  Section  3,  we  pr^nt  the  view-qnery 
protocol  for  static  network  conditions,  that  is,  assuming  all  links  and  nodes  of  the  network  remam 
operational.  In  Section  4,  we  present  the  view-npdate  protocol  to  handle  topology  changes.  In 
Section  0,  we  present  onr  evaluation  model.  Out  lesnlts  ate  presented  in  Section  6.  Section 

concludes  the  paper. 

^  We  use  the  terms  route  and  path  interchangeably. 
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2  Related  Work 


In  this  section,  we  discuss  routing  protocols  recently  proposed  for  packet-switched  QoS  networks. 
These  routing  protocols  can  be  classified  depending  on  whether  they  help  the  network  support 
qualiiaiive  QoS  or  quantitative  (real-time)  QoS.  For  a  qualitative  QoS,  the  network  tries  to  provide 
the  service  requested  by  the  application  with  no  performance  guarantees.  Such  a  service  is  often 
identified  as  “best-effort”.  A  quantitative  QoS  provides  performance  guarantees  (typically  required 
by  real-time  applications);  for  example,  an  upper  bound  on  the  end-to-end  delay  for  any  packet 
received  at  the  destination. 

Routing  protocols  that  make  routing  decisions  on  a  per  VC  basis  can  be  used  to  provide  either 
qualitative  or  quantitative  QoS.  For  a  quantitative  QoS,  some  admission  control  tests  should  be 
performed  during  the  VC-setup  message’s  trip  to  the  destination  to  try  to  reserve  resources  along 
the  VC’s  path  as  described  in  Section  1. 

On  the  other  hand,  the  use  of  routing  protocols  that  make  routing  decisions  on  a  per  packet 
basis  is  problematic  in  providing  resource  guarantees  [5],  and  qualitative  QoS  is  the  best  service 
the  network  can  offer. 

Since  we  are  concerned  in  this  paper  with  real-time  QoS,  we  limit  our  following  discussion  to 
VC  routing  schemes  proposed  or  evaluated  in  this  context,  ^^'e  refer  the  reader  to  [19,  6]  for  a  good 
survey  on  many  other  routing  schemes. 

Most  of  the  VC  routing  schemes  proposed  for  real-time  QoS  networks  are  based  on  the  link- 
state  full-view  approach  described  in  Section  1  [6,  1,  10,  24].  Recall  that  in  this  approach,  each 
end-system  maintains  a  view  of  the  whole  network,  i.e.  a  graph  with  a  vertex  for  every  node  and 
an  edge  between  two  neighbor  nodes.  QoS  information  is  attached  to  the  vertices  and  the  edges  of 
the  view.  This  QoS  information  is  distributed  regularly  to  all  end- systems  to  update  their  views 
and  thus  enable  the  selection  of  appropriate  source  routes  for  VCs,  i.e.  routes  that  are  likely  to 
meet  the  requested  QoS.  The  proposed  schemes  mainly  differ  in  how  this  QoS  information  is  used. 
Generally,  a  cost  function  is  defined  in  terms  of  the  QoS  information,  and  used  to  estimate  the 
cost  of  a  path  to  the  VC’s  destination.  The  route  selection  algorithm  then  favors  short  paths  with 
Tni-nirmim  cost.  See  [17,  22]  for  an  evaluation  of  several  schemes. 

A  number  of  VC  routing  schemes  have  also  been  designed  for  networks  using  the  Virtual  Path 
(VP)  concept  [15,  14].  This  VP  concept  has  been  proposed  to  simplify  netw^ork  management  and 
control  by  having  separate  (logically)  fully-connected  subnetworks,  typically  one  for  each  service 
class.  In  each  VP  subnetwork,  simple  routing  schemes  that  only  consider  one-hop  and  two-hop 
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paths  are  used.  However,  the  advantage  of  using  VPs  can  be  offset  by  a  decrease  in  statistical 
multiplexing  gains  of  the  subnetworks  [15).  In  this  work,  we  are  interested  in  general  network- 
topologies,  where  the  shortest  paths  can  be  of  arbitrary  hop  length  and  the  overhead  of  routing 
protocols  is  of  much  concern. 

All  the  above  VC  routing  schemes  are  based  on  the  link-state  approach.  VC  routing  schemes 
based  on  the  path- vector  approach  have  also  been  proposed  [13j.  in  this  approach,  for  each  desti- 
•  nation  a  node  maintains  a  set  of  paths,  one  through  each  of  its  neighbor  nodes.  QoS  information 
IS  attached  to  these  paths.  For  each  destination,  a  node  exchanges  its  best  feasible  path^  with  its 
neighbor  nodes.  The  scheme  in  [13]  provides  two  kinds  of  routes:  pre-computed  and  on-demand. 
Pre-computed  routes  match  some  well-known  QoS  requirements,  and  are  maintained  using  the 
path-vector  approach.  On-demand  routes  are  calculated  for  specific  QoS  requirements  upon  re¬ 
quest.  In  this  calculation,  the  source  broadcasts  a  special  packet  over  all  candidate  paths.  The 
destination  then  selects  a  feasible  path  from  them  and  informs  the  source  [13,  23).  One  drawback 
of  this  scheme  is  that  obtaining  on-demand  routes  is  very  expensive  since  there  are  potentially 
exponential  number  of  candidate  paths  between  the  source  and  the  destination. 

The  link-state  approach  is  often  proposed  and  favored  over  the  path-vector  approach  in  QoS 
architectures  for  several  reasons  [16).  An  obvious  reason  is  simplicity  and  complete  control  of  the 
source  over  QoS  route  selection. 

The  above  VC  routing  schemes  do  not  scale  weh  to  large  QoS  networks  in  terms  of  storage 
and  communication  requirements.  Several  techniques  to  achieve  scaling  exist.  The  most  common 
technique  is  the  area  hierarchy  described  in  Section  1. 

The  landmark  hierarchy  [26,  25)  is  another  approach  for  solving  the  scaling  problem.  The  link- 
state  approach  can  not  be  used  with  the  landmark  hierarchy.  A  thorough  $iudy  of  enforcing  QoS 
and  policy  constraints  with  this  hierarchy  has  not  been  done. 

Finally,  we  should  point  out  that  ex-tensive  effort  is  currently  underway  to  fully  specify  and 
standardize  VC  routing  schemes  for  the  future  integrated  services  Internet  and  ATM  networks  [9). 

3  Viewserver  Hierarchy  Query  Protocol 

In  this  section,  we  present  our  scheme  for  static  net.-ork  conditions,  that  is,  all  Hnks  and  nodes 
remain  operational.  The  dynamic  case  is  presented  in  Section  4. 

®  A  feasible  path  is  a  path  that  satisfies  the  QoS  coDslraiats  of  the  nodes  in  the  path. 
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Conventions:  Each  node  ha5  a  unique  id.  Nodelds  denotes  the  set  of  node-ids.  For  a  node  u,  we 
use  nodeid{u)  to  denote  the  id  of  u.  NodeNeighbors{u)  denotes  the  set  of  ids  of  the  neighbors  of  u. 

In  OUT  protocol,  a  node  u  uses  two  kinds  of  sends.  The  first  kind  has  the  form  “Send(m)  to  u”, 
where  m  is  the  message  being  sent  and  v  is  the  destination-id.  Here,  nodes  u  and  v  are  neighbors, 
and  the  message  is  sent  over  the  physical  link  (u,  v).  K  the  link  is  down,  we  assume  that  the  packet 
is  dropped. 

The  second  kind  of  send  has  the  form  “Send(Tn)  to  v  using  sr”,  where  m  and  v  are  as  above 
and  sr  is  a  source  route  between  u  and  v.  We  assume  that  as  long  as  there  is  a  sequence  of  up 
links  connecting  the  nodes  in  sr,  the  message  is  delivered  to  v.  This  requires  a  transport  protocol 
support  such  as  TCP  [20]. 

To  implement  both  kind  of  sends,  we  assume  there  is  a  reserved  VC  on  each  link  for  sending 
routing,  signaling  and  control  messages  [4].  This  also  ensures  that  routing  messages  do  not  degrade 
the  QoS  seen  by  applications. 

Views  and  Viewservers 

Views  are  maintained  by  special  nodes  called  viewservers.  Each  viewserver  has  a  precinct,  which  is 
a  set  of  nodes  around  the  viewserver.  A  viewserver  maintains  a  view,  consisting  of  the  nodes  in  its 
precinct,  links  between  these  nodes  and  links  outgoing  from  the  precinct^.  Formally,  a  viewserver 
X  maintains  the  following: 

Precinct-  C  Nodelds.  Nodes  whose  view  is  maintained. 

View-.  View  of  z. 

=  {{u,  timestamp,  expirytime,  {{v,  cost)  :  v  €  NodeNeighbor$(u)))  : 
u  €  PrecincU} 

The  intention  of  Viewx  is  to  obtain  source  routes  between  nodes  in  Precinct,;.  Hence,  the 
choice  of  nodes  to  include  in  PrecincU  and  the  choice  of  links  to  include  in  Viewx  are  not  arbitrary. 
Precinctx  and  View-  must  be  connected;  that  is,  between  any  two  nodes  in  Precincix,  there  should 
be  a  path  in  View,.  Note  that  Viewx  can  contain  links  to  nodes  outside  Precincix.  We  say  that  a 
node  u  is  in  the  view  of  a  viewserver  x,  if  either  u  is  in  the  precinct  of  a,  or  VieWx  has  a  link  from 
a  node  in  the  precinct  of  x  to  node  u.  Note  that  the  precincts  and  views  of  different  viewservers 
can  be  overlapping,  identical  or  disjoint. 

■'  N 01  all  the  links  need  to  be  included. 
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For  a  Hnk  (ti,  v)  in  the  view  of  a  viewserver  x,  View^  stores  a  cost.  The  cost  of  the  link  (u,  v) 
equals  a  vector  of  values  if  the  link  is  known  to  be  up;  each  cost  value  estimates  how  expensive  it 
IS  to  cross  the  link  according  to  some  QoS  criteria  such  as  delay,  throughput,  loss  rate,  etc.  The 
cost  equals  oo  if  the  link  is  knowm  to  be  down.  Cost  of  a  link  changes  with  time  (see  Section  4). 
The  view  also  includes  timestamp  and  expirytime  fields  which  axe  described  in  Section  4. 

Viewserver  Hierarchy 

For  scaling  reasons,  we  cannot  have  one  large  view.  Thus,  obtaining  a  source  route  between  a  source 
and  a  destination  which  are  far  away,  involves  accumulating  views  of  a  sequence  of  viewservters.  To 
keep  this  process  efficient,  we  organize  viewservers  hierarchically.  More  precisely,  each  viewserver  is 
assigned  a  hierarchy  level  from  0,1,.. .,  with  0  being  the  top  level  in  the  hierarchy.  A  parent-child 
relationship  between  viewservers  is  defined  as  foDows: 

1.  Every  level  t  viewserver,  i  >  0,  has  a  parent  viewserver  whose  level  is  less  than  i. 

2.  If  viewserver  i  is  a  parent  of  viewserver  y  then  x’s  precinct  contains  y  and  y’s  precinct 
contains  x. 

3.  The  precinct  of  a  top  level  viewserver  contains  all  other  top  level  viewservers. 

In  the  hierarchy,  a  parent  can  have  many  children  and  a  child  can  have  many  parents.  We  extend 
the  range  of  the  parent-child  relationship  to  ordinary  nodes;  that  is,  if  Precinct^  contains  the  node 

u,  we  say  that  u  is  a  child  of  x,  and  i  is  a  parent  of  u.  We  assume  that  there  is  at  least  one  parent 
viewserver  for  each  node. 

For  a  node  u,  an  address  is  defined  to  be  a  sequence  (xo, Si, •  •  .,Xi)  such  that  x.-  for  i  <  t  is 
a  viewserver-id,  xq  is  a  top  level  viewserver-id,  x,  is  the  id  of  u,  and  x,-  is  a  parent  of  x,+i.  A 
node  may  have  many  addresses  since  the  parent-child  relationship  is  many-to-many.  Jf  a  source 
node  wants  to  establish  a  VC  to  a  destination  node,  it  first  queries  the  name  servers  to  obtain  a 
set  of  addresses  for  the  destination^.  Second,  it  queries  viewservers  to  obtain  an  accumulated  view 
contaimng  both  itself  and  the  destination  node  (it  can  reach  its  parent  viewservers  by  using  fixed 
source  routes  to  them).  Then,  it  chooses  a  feasible  source  route  from  this  accumulated  view  and 
initiates  the  VC  setup  prptocol  on  this  path. 

View-Query  Protocol:  Obtaining  Source  Routes 

We  now  describe  how  a  source  route  is  obtained. 

Querying  the  name  servers  can  be  done  in  the  same  way  as  is  done  currently  in  the  Internet. 
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We  want  a  sequence  of  viewservers  whose  naerged  views  contains  both  the  source  and  the 
destination  nodes.  Addresses  provide  a  way  to  obtain  such  a  sequence,  by  first  going  up  in  the 
viewserver  hierarchy  starting  from  the  source  node  and  then  going  down  in  the  viewserver  hierarchy 
towards  the  destination  node.  More  precisely,  let  {so,...,St)  be  an  address  of  the  source,  and 
{do, .  .■,di)  be  an  address  of  the  destination.  Then,  the  sequence  (st-i .  .,so,do, . . d;_i)  meets 
oui  requirements.  In  fact,  going  up  all,  the  way  in  the  hierarchy  to  top  level  viewservers  may  not 
be  necessary.  We  can  stop  going  up  at  a  viewserver  s,-  if  there  is  a  viewserver  dj,j  <  I,  in  the  view 
of  Si  (one  special  case  is  where  s,-  =  dj). 

The  view-query  protocol  uses  two  message  types: 

•  (RequestView,  sjiddress,  d-address) 

where  s.address  and  djaddress  are  the  addresses  for  the  source  and  the  destination  respec¬ 
tively.  A  RequestView  message  is  sent  by  a  source  node  to  obtain  an  accumulated  view  con¬ 
taining  both  the  source  and  the  destination  nodes.  When  a  viewserver  receives  a  RequestView 
message,  it  either  sends  back  its  view  or  forwards  this  request  to  another  viewserver. 

*  (ReplyView,  s-address,  d-address,  accumview) 

where  SMddress  and  djaddress  are  as  above  and  accumview  is  the  accumulated  view.  A 
Repl3'-View  message  is  sent  by  a  viewserver  to  the  source  or  to  another  viewserver  closer  to 
the  source.  The  accumview  field  in  a  Reply  View’  message  equals  the  union  of  the  view’s  of 
the  viewservers  the  message  has  visited. 

We  now  describe  the  view-query  protocol  in  more  detail  (please  refer  to  Figures  1  and  2).  To 
establish  a  VC  to  a  destination  node,  the  source  node  sends  a  RequestView  packet  containing  the 
source  and  the  destination  addresses  to  its  parent  in  the  source  address. 

Upon  receiving  a  RequestView  packet,  a  viewserver  z  checks  if  the  destination  node  is  in  its 
precinct^,  if  it  is,  x  sends  back  its  view  in  a  ReplyView  packet.  If  it  is  not,  x  forwards  the  request 
packet  to  another  viewserver  as  foUow's  (details  in  Figure  2):  x  checks  whether  any  viewserver  in 
the  destination  address  is  in  its  view-.  If  there  is  such  a  viewserver,  x  sends  the  RequestView  packet 
to  the  last  such  one  in  the  destination  address.  Otherwise  a  is  a  viewserver  in  the  source  address, 
and  it  sends  the  packet  to  its  parent  in  the  source  address. 

WTien  a  viewserver  x  receives  a  ReplyView  packet,  it  merges  its  view  to  the  accumulated  view 
in  the  packet.  Then  it  sends  the  ReplyView  packet  towards  the  source  node  in  the  ssime  way  it 
would  send  a  RequestView  packet  towards  the  destination  node  (i.e.  the  roles  of  the  source  address 

Even  though  the  destiDatiou  can  be  in  the  view  of  x,  its  QoS  characteristics  is  not  in  the  view  if  it  is  not  in  the 
precinct  of  x. 
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Constants 

FizedRouies^i{x),  for  every  viewserver-id  x  such  that  i  is  a  parent  of  u, 

=  {{Vi)  ■  -  jVn)  •  y*  €  Kodelds}.  Set  of  routes  to  z 

Events 

RequestVieiVu(s -address y  d-address)  {Executed  when  u  wants  a  source  route} 

Let  s-address  be  (sq,  . . . ,  St_i,  St),  and  sr  £  FixedRouteSu{st-.\)\ 

Send(RequestViey,  sjaddrcssy  d-addrcss)  to  St^y  using  sr 

iJece:veu(ReplyViey,  s_adcfress,  djaddress^  accumview) 

Choose  a  feasible  source  route  using  accumvitw\ 

If  a  feasible  route  is  not  found 

Execute  RtqutsiVitw^  again  with  another  source  address  and/or  destination  address 


Figure  1:  View-query  protocol:  Events  and  state  of  a  source  node  u. 


Constants 

Prtcinclx-  Precinct  of  x. 

Variables 

Viewx .  View  of  x. 

Events 

-Rece:uex(RequestVieWj  s^address^  d-address) 

Let  d-address  be  (do , . . . ,  dj) ; 
if  dt  0  Precincix  then 

/oru;ardx(RequestViey,  sjaddress,  d.address,  {}); 

else  /orr£;ard2: (Reply Vi ey,  d-address^  s.address,  View^)]  {addresses  are  switched) 

endif 

ilece2i;ex(R€plyViey,  s_address,  djaddress^  view) 

/oru;ard-(ReplyViey,  s.oddress,  d-address,  view  \J  V iew^) 

where  procedure  f  or  war  d^  {type  ^  s.address,  djaddress^  view) 

Lei  s^address  be  (sc,  - .  d-address  be  (do, . . d/); 

if  3 2  :  di  in  View^:  then 

Lei  2  =  inax{y  :  dj  in  View^]; 

target  d,*; 

else  target  :=  s,-  such  that  =  nodeid{x)\ 
endif; 

ST  :=  choose  a  route  to  target  from  nodeid{x)  using  Viewz\ 
if  type  =  RequestViey  then 

Send(RequestViey,  s-address,  djaddress)  to  target  using  sr; 

else  Send(ReplyViey,  sjaddress^  djiddress^  view)  to  target  using  sr; 

endif 


Figure  2:  View-query  protocol:  Events  and  state  of  a  viewserver  x. 
and  the  destination  address  are  interchanged). 
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When  the  source  receives  a  ReplyViev  packet,  it  chooses  a  feasible  path  using  the  accumview 
in  the  packet.  If  it  does  not  find  a  feasible  path,  it  can  try  again  using  a  different  source  and/or 
destination  addresses.  Note  that  the  source  does  not  have  to  throw  away  the  previous  accumulated 
views;  it  can  merge  them  all  into  a  richer  accumulated  view.  In  fact,  it  is  easy  to  change  the  protocol 
so  that  the  source  can  also  obtain  views  of  individual  viewservers  to  make  the  accumulated  view 
even  richer.  Once  a  feasible  source  route  is  found,  the  source  node  initiates  the  VC  setup  protocol. 

Above  we  have  described  one  possible  way  of  obtaining  the  accumulated  views.  There  are 
various  other  possibilities,  for  example:  (1)  restricting  the  ReplyViev  packet  to  take  the  reverse 
of  the  path  that  the  RequestView  packet  took;  (2)  having  ReplyViev  packets  go  all  the  way 
up  in  the  viewserver-hierarchy  for  a  richer  accumulated  view;  (3)  having  the  source  poll  the 
viewservers  directly  instead  of  the  viewservers  forwarding  request/reply  messages  to  each  other; 
(4)  not  including  non-transit  nodes  (e.g.  end-systems)  other  than  the  source  and  the  destination 
nodes  in  the  accumview:  (5)  including  some  QoS  requirements  in  the  RequestViev  packet,  and 
having  the  viewservers  filter  out  some  nodes  and  links. 

4  Update  Protocol  for  Dynamic  Network  Conditions 

In  this  section,  we  first  describe  how  topology  changes  such  as  link/node  failures,  repairs  and  cost 
changes,  are  detected  and  communicated  to  viewservers,  i.e.  the  view-update  protocol.  Then,  we 
modify  the  view-query  protocol  appropriately. 

View-Update  Protocol:  Updating  Views 

Viewservers  do  not  communicate  with  each  other  to  maintain  their  views.  Nodes  detect  and 
communicate  topolog>’  changes  to  viewservers.  Updates  are  done  periodically  and  also  optionally 
after  a  change  in  the  outgoing  link  costs. 

The  communication  between  a  node  and  viewservers  is  done  by  flooding  over  a  set  of  nodes. 
This  set  is  referred  to  as  the  flood  area.  The  topology  of  a  flood  area  must  be  a  connected  graph. 
For  efficiency,  the  flood  area  can  be  implemented  by  a  hop-count. 

Due  to  the  nature  of  flooding,  a  viewserver  can  receive  information  out  of  order  from  a  node.  In 
order  to  avoid  old  information  replacing  new  information,  each  node  includes  successively  increasing 
time  stamps  in  the  messages  it  sends.  The  timestamp  fleld  in  the  view  of  a  viewserver  equals  the 
largest  timestamp  received  from  each  node. 


162 


Due  to  node  and  link  failures,  communication  between  a  node  and  a  viewserver  can  fail,  resulting 
in  the  viewserver  having  out-of-date  information.  To  eliminate  such  information,  a  viewserver 
deletes  any  information  about  a  node  if  it  is  older  than  a  iime-io-die  period.  The  expiryiime  field 
in  the  view  of  a  viewserver  equals  the  end  of  the  time-to-die  period  for  a  node.  We  assume  that 
nodes  send  messages  more  often  than  the  tim^-to-ciie  \«lue  (to  avoid  false  removal). 

The  view-update  protocol  uses  one  type  df  message  follows: 

•  (Update,  nid,  timestamp^  floodarea,  ncostset) 
is  sent  by  the  node  to  inform  the  viewservers  about  current  costs  of  its  outgoing  links.  Here, 
nid  and  timestamp  indicate  the  id  and  the  time  stamp  of  the  node,  ncostset  contains  a  cost 
for  each  outgoing  link  of  the  node,  and  floodarea  is  the  set  of  nodes  that  this  message  is  to 
be  sent  over. 

Constants: 

FloodAreag.  (C  Kodelds).  The  flood  area  of  the  node. 

Variables: 

Clockg  :  Integer.  Clock  of  g. 

Figure  3:  State  of  a  node  g. 

The  state  maintained  by  a  node  g  is  listed  in  Figure  3.  We  assume  that  consecutive  reads  of 
Clockg  returns  increasing  values. 

Constants: 

PrecincU.  Precinct  of  x. 

TimeToDiCs:  :  Integer.  Time-to-die  value. 

Variables: 

Vieuv.  View  of  x. 

Clock::  :  Integer.  Clock  of  x. 

Figure  4:  State  of  a  viewserver  2. 

The  state  maintained  by  a  viewserver  x  is  listed  in  Figure  4. 

The  events  of  node  g  are  specified  in  Figure  5.  The  events  of  a  viewserver  x  are  specified  in 
Figure  6.  When  a  viewserver  x  recovers,  is  set  to  {}.  Its  view  becomes  up-to-date  as  it 

receives  new  information  from  nodes  (and  remove  false  information  with  the  time-to-die  period). 
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Updaitg  {Executed  periodically  and  also  optionally  upon  a  change  in  outgoing  link  costs} 

ncosisti  ;=  compute  costs  for  each  outgoing  link; 

/iood, ((Update,  nodtid{_g),  Clockg,  FloodAreag,ncosisci)y, 

Receivcgipacket)  {an  Update  packet) 

fic>odg{packei) 

where  procedure  floodg^packei) 

if  iiodeid{g)  €  packet./ loodarea  then 

{remove  g  from  the  flood  area  to  avoid  infinite  exchange  of  the  same  message.) 

packet. floodarta  :=  packet. / 1 oodarta  —  {nod«‘d(ff)); 

for  ail  h  6  NodeNeighbors{g)  A  h£  packet,  floodarea  do 

Send(pacl:et)  to  h\ 

endif 

Node  Faiinre  Model:  A  node  can  undergo  failures  and  recoveries  at  anytime.  We  assume  failures  are 
fail-stop  (i.e.  a  failed  node  does  not  send  erroneous  messages). _ _ _ 

Figure  5:  View -update  protocol:  Events  of  a  node  g. 


Jieceiuex  (Update,  nid,  is,  FloodArea,  ncset) 
if  nid  €  Precinciz  then 

if  3(n£d,  timestamp,  expirytime,  ncostset)  G  Views  A  ts  >  timestamp  then 

{received  is  more  recent;  delete  the  old  one) 

delete  (nid,  iimtsiamp,  expirytime,  ncostset)  from  View:; 

endif 

if  -3{nid,  timestamp,  expirytime,  ncostset)  G  VteWz  then 
ncostset  subset  of  edge-cost  pairs  in  ncset  that  are  in  Vieitv, 
insert  {nid,  is.  Clock. TimeToDie- ,  ncostset)  toVieu-j; 
endif 
endif 

Delete.  {Executed  periodically  to  delete  entries  older  than  the  lime-to-die  period) 

for  all  {nid,  tstamp,  expirytime,  ncset)  G  View-  A  expirytime  <  Clock,  do 
delete  {nid,  tstamp,  expirytime,  ncset)  from  Viett'x; 

Viewserver  Failure  Model:  A  viewserver  can  undergo  failures  and  recoveries  at  anytime.  We  assume 
failures  are  fail-stop.  When  a  viewserver  x  recovers,  View^  is  set  to  {}. _ _ _ _ 

Figure  6:  Vievf  update  events  of  a  viewserver  x. 


Changes  to  View- Query  Protocol 

We  now  enumerate  the  changes  needed  to  adapt  the  view-query  protocol  to  the  dynamic  case  (the 
formal  specification  is  omitted  for  space  reasons). 

Due  to  link  and  node  failures,  RequestView  and  ReplyViev  packets  can  get  lost.  Hence,  the 
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source  may  never  receive  a  ReplyView  packet  after  it  initiates  a  request.  Thus,  the  source  should 
try  again  after  a  time-out  period. 

When  a  viewserver  receives  a  RequestVieu  message,  it  should  reply  with  its  views  only  if  the 
destination  node  is  in  its  precinct  and  its  view  contains  a  path  to  the  destination.  Similarly  during 
forwarding  of  RequestView  and  ReplyView  packets,  a  viewserver,  when  checking  whether  a  node 
is  in  its  view,  should  also  check  if  its  view  contains  a  path  to  it. 

5  Evaluation 

In  this  section,  we  present  the  parameters  of  our  simulation  model.  We  use  this  model  to  com¬ 
pare  our  viewserver-based  VC  routing  protocols  to  the  simple  approach.  The  results  obtained  are 
presented  in  Section  6. 

Network  Parameters 

We  model  a  campus  network  which  consists  of  a  campus  backbone  subnetwork  and  several  depart¬ 
ment  subnetworks.  The  backbone  network  consists  of  backbone  switches  and  backbone  links. 

Each  department  netw'ork  consists  of  a  hub  switch  and  several  non-hub  switches.  Each  non-hub 
switch  has  a  link  to  the  department’s  hub  switch.  And  the  department’s  hub  swdtch  has  a  link  to 
one  of  the  backbone  switches.  A  non-hub  switch  can  have  links  to  other  non-hub  switches  in  the 
same  department,  to  non-hub  switches  in  other  departments,  or  to  backbone  switches. 

End-systems  are  connected  to  non-hub  switches.  An  example  network  topology  is  shown  in 
Figure  7. 

In  our  topolog}',  there  are  8  backbone  switches  and  32  backbone  links.  There  are  16  departments. 
There  is  one  hub-switch  in  each  department.  There  is  a  total  of  240  non-hub  switches  randomly 
assigned  to  different  departments.  There  are  2500  end-systems  which  are  randomly  connected  to 
non-hub  switches.  Thus,  we  have  a  total  of  2764  nodes. 

In  addition  to  the  links  connecting  non-hub  switches  to  the  hub  switches  and  hub  switches  to 
the  backbone  switches,  there  are  720  links  from  non-hub  switches  to  non-hub  switches  in  the  same 
department,  there  are  128  links  from  non-hub  switches  to  non-hub  switches  in  different  departments, 
and  there  are  64  links  from  non-hub  switcjies  to  backbone  switches. 

The  end-points  of  each  link  are  chosen  randomly.  However,  we  make  sure  that  the  backbone 
network  is  connected;  and  there  is  a  link  from  node  u  to  node  v  iff  there  is  a  link  from  node  v  to 
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^  Backbone  switches 
^  Hub  switches 
Q  Non-bub  switches 
Q  End-systems 


Figure  7:  An  example  network  topology. 


node  u. 

Each  link  has  a  total  of  C  units  of  bandwidth. 

QoS  and  Workload  Parameters 

In  our  e\'a]uation  model,  we  assume  that  a  VC  requires  the  reservation  of  a  certain  amount  of 
bandwidth  that  is  enough  to  ensure  an  acceptable  QoS  for  the  application.  This  resert'ation  amount 
can  be  thought  of  either  as  the  peak  transmission  rate  of  the  VC  ox  its  ‘'effective  bandwidth”  [12] 
%-arying  between  the  peak  and  average  transmission  rate. 

VC  setup  requests  arrive  to  the  network  according  to  a  Poisson  process  of  rate  A,  each  requiring 
one  unit  of  bandwidth.  Each  VC,  once  it  is  successfully  setup,  has  a  lifetime  of  exponential  duration 
with  mean  1/fj..  The  source  and  the  destination  end-systems  of  a  VC  are  chosen  randomly. 

An  arriving  VC  is  admitted  to  the  network  if  at  least  one  feasible  path  between  its  source  and 
destination  end-systems  is  found  by  the  routing  protocol,  where  a  feasible  path  is  one  that  has  links 
with  non-zero  available  capacity.  From  the  set  of  feasible  paths,  a  minimum  hop  path  is  used  to 
establish  the  VC;  one  unit  of  bandwidth  is  allocated  on  each  of  its  links  for  the  lifetime  of  the  VC. 
On  the  other  hand,  if  a  fezLsible  path  is  not  found,  then  the  arriving  VC  is  blocked  and  lost. 

We  assume  that  the  available  link  capacities  in  the  views  of  the  viewservers  are  updated  instan- 
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taneouslv  whenever  a  VC  is  admitted  to  the  network  or  terminates. 


Viewserver  Hierarchy’  Schemes 

V\'e  have  evaluated  our  viewserver  protocol  for  several  different  viewserver  hierarchies  and  query 
methods.  We  next  describe  the  different  view'server  schemes  evaluated.  Please  refer  to  Figure  7  in 
the  following  discussion.  , 

The  first  viewserver  scheme  is  referred  to  as  base.  Each  switch  is  a  viewserver.  A  viewserver’s 
precinct  consist  of  itself  and  the  neighboring  nodes.  The  links  in  the  viewserver’s  view  consist  of 
the  links  between  the  nodes  in  the  precinct,  and  Hnks  outgoing  from  nodes  in  the  precit'ct  to  nodes 
not  in  the  precinct.  For  example,  the  precinct  of  viewserver  it-consists  of  nodes  •u,v,  w>5- 

As  for  the  viewserver  hierarchy,  a  backbone  switch  is  a  level  0  viewserver,  a  hub  switch  is  a 
level  1  viewserver  and  a  non-hub  switch  is  a  level  2  viewserver.  Parent  of  a  hub  switch  viewserver 
is  the  backbone  sw-itch  viewserver  it  is  connected  to.  Parent  of  a  non-hub  switch  viewserver  is  the 
hub  sw^itch  viewserver  in  its  department.  Parent  of  an  end-system  is  the  non-hub  switch  viewserver 
it  is  connected  to. 

We  use  only  one  address  fox  each  end-system.  The  viewserver-ad dress  of  an  end-system  is  the 
concatenation  of  four  ids.  Thus,  the  address  of  s  is  z.v.u.s.  Similarly,  the  address  of  d  is  z.v.x.d. 
To  obtain  a  route  between  s  and  d,  it  suffices  to  obtain  views  of  viewservers  u.v^x. 

The  second  viewserver  scheme  is  referred  to  as  base-QT  (where  the  QT  stands  for  ‘'query  up 
to  top").  It  IS  identical  to  base  except  that  during  the  query  protocol  all  the  viewservers  in  the 
source  and  the  destination  addresses  are  queried.  That  is,  to  obtain  a  route  between  s  and  d.  the 
views  of  u^v.  x.z  are  obtained. 

The  third  viewserver  scheme  is  referred  to  as  vertex-extension.  It  is  identical  to  base  except 
that  viewserver  precincts  are  extended  as  follows:  Let  ?  denote  the  precinct  of  a  viewserver  in  the 
base  scheme.  For  each  node  n  in  P,  if  there  is  a  link  from  node  n  to  node  r  and  u  is  not  in  P,  node 
r  is  added  to  the  precinct;  among  n’s  hnks,  only  the  ones  to  nodes  in  P  are  added  to  the  view.  In 
the  example,  nodes  2,  ji,2,q  are  added  to  the  precinct  of  n,  but  outgoing  hnks  of  these  nodes  to 
other  nodes  are  not  included  (e.g.  (i,p)  and  (2,  q)  are  not  included).  The  advantage  of  this  scheme 
is  that  even  though  it  increases  the  precinct  size  by  a  factor  of  d  (where  d  is  i\te  average  number  of 
neighbors  to  a  node),  it  increases  the  number  of  hnks  stored  in  the  view  by  a  factor  less  than  2. 

The  fourth  viewserver  scheme  is  referred  to  as  veriex-extension-QT.  It  is  identical  to  vertex- 
exiension  except  that  during  the  query  protocol  all  the  viewservers  in  the  source  and  the  destination 
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addresses  are  queried. 


6  Numerical  Results 

6.1  Results  for  Network  1 

The  parameters  of  the  first  network  topology,  referred  to  zls  Network  1,  are  given  in  Section  5.  The 
link  capacity  C  is  taken  to  be  20  [6],  i.e.  a  link  is  capable  of  carrying  20  VCs  simultaneously. 

Our  evaluation  measures  were  computed  for  a  (randomly  chosen  but  fixed)  set  of  100,000  VC 
setup  requests.  Table  1  lists  for  each  viewserver  scheme  (1)  the  minimum,  average  and  maximum 
of  the  precinct  sizes  (in  number  of  nodes),  (2)  the  minimum,  average  and  maximum  of  the  merged 
view  sizes  (in  number  of  nodes),  and  (3)  the  minimum,  average  and  maximum  of  the  number  of 
view'servers  queried. 


Scheme 

Precinct  Size 

Merged  View  Size 

No.  of  Viewservers  Queried 

base 

5  /  16.32  /  28 

4  /  56.46  /  81 

1  /  5.49  /  6 

base-QT 

5  /  16.32  /  28 

27  /  59.96  /  81 

6  /  6.00  /  6 

vertex- exiension 

22  /  88.11  /  288 

14  /  155.86  /  199 

1  /  5.49  /  6 

vertex- extension- QT 

22  /  88.11  /  288 

113  /  163.28  /  199 

6  /  6.00  /  6 

Table  1;  Precinct  sizes,  merged  view  sizes,  and  number  of  viewservers  queried  for  Network  1. 


The  precinct  size  indicates  the  memory  requirement  at  a  viewserver.  More  precisely,  the  memory 
requirement  at  a  viewserver  is  0 (precinct  size  x  d).  except  for  the  vertex-extension  and  vertex- 
extension-QT  schemes.  In  these  schemes,  the  memory  requirement  is  increased  by  a  factor  less 
than  two.  Hence  these  schemes  have  the  same  order  of  viewserver  memory  requirement  as  the  base 
and  base-QT  schemes. 

The  merged  view  size  indicates  the  memory  requirement  at  a  source  end-system  during  the 
query  protocol;  i.e.  the  memory  requirement  at  a  source  end-system  is  0  (merged  view  size  x  d) 
except  for  the  vertex- extension  sud  vertex- extension- QT  schemes.  Note  that  the  source  end-system 
does  not  need  to  store  information  about  end-systems  other  than  itself  and  the  destination.  The 
numbers  in  Table  1  take  advantage  of  this. 

The  number  of  viewservers  queried  indicates  the  communication  time  required  to  obtain  the 
merged  view  at  the  source  end-system.  Hence,  the  “real-time”  communication  time  required  to 
obtain  the  merged  view  at  a  source  is  shghtly  more  than  one  round-trip  time  between  the  source 
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and  the  destination. 

As  is  apparent  from  Table  1,  using  a  QT  scheme  increases  the  merged  view  size  by  about  6%, 
and  the  number  of  viewservers  queried  by  about  9%.  Using  the  vertex-extension  scheme  increases 
the  merged  view  size  by  about  3  times  (note  that  the  amount  of  actual  memory  needed  increases 
only  by  a  factor  less  than  2). 

The  above  measures  show  the  memory  and  time  requirements  of  our  protocols.  They  clearly 
indicate  the  savings  in  storage  over  the  simple  approach  as  manifested  by  the  smaller  view  sizes.  To 
answer  whether  the  viewserver  hierarchy  finds  many  feasible  paths,  other  evaluation  measures  such 
as  the  carried  VC  load  and  the  percent  VC  blocking  are  of  interest.  They  are  defined  as  follows; 

•  Carried  VC  load  is  the  average  number  of  VCs  carried  by  the  network. 

•  Percent  VC  blocking  is  the  percentage  of  VC  setup  requests  that  are  blocked  due  to  the  fact 
that  a  feasible  path  is  not  found. ^ 

.  In  our  experiments,  we  keep  the  average  VC  lifetime  (1/p)  fixed  at  15000  and  vary  the  arrival 
rate  of  VC  setup  requests  (A).  Figure  8  shows  the  carried  VC  load  versus  A  for  the  simple  approach 
and  the  viewserver  schemes.  Figure  9  shows  the  percent  VC  blocking  versus  A.  At  low  values  of  A, 
all  the  view’server  schemes  are  very  close  to  the  simple  approach.  At  moderate  values  of  A,  the  base 
ana  base-QT  schemes  perform  badly.  The  vertex-extension  and  vertex-extension- QT  schemes  are 
still  very  close  to  the  simple  approach  (only  3.4%  less  carried  VC  load).  Note  that  the  performance 
of  the  viewserver  schemes  can  be  further  improved  by  trying  more  view'server  addresses. 

Surprisingly,  at  high  values  of  A,  all  the  viewserver  schemes  perform  better  than  the  simple 
approach.  At  A  =  0.5,  the  network  with  the  base  scheme  carries  about  30%  higher  load  than  the 
simple  approach.  This  is  an  interesting  result.  Our  explanation  is  as  follows.  Elsewhere  [2],  we 
have  found  that  when  the  viewserver  schemes  can  not  find  an  existing  feasible  path,  this  path  Is 
usually  very  long  (more  than  11  hops).  This  causes  our  viewserver  nierarchy  protocols  to  reject 
VCs  that  are  admitted  by  the  simple  approach  over  long  paths.  The  use  of  long  paths  for  VCs  is 
undesirable  since  it  ties  up  resources  at  more  intermediate  nodes,  which  can  be  used  to  admit  many 
shorter  length  VCs. 

In  conclusion,  we  recommend  the  vertex- extension  scheme  as  it  performs  close  to  or  better 
than  all  other  schemes  in  terms  of  VC  carried  load  zind  blocking  probability  over  a  wide  range  of 
workload.  Note  that  for  all  viewserver  schemes,  adding  QT  yields  shghtly  further  improvement. 

RecaJJ  that  we  assume  a  blocked  VC  setup  request  is  cleared  (i.e.  lost). 
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Carried  VC  load  vs  Anival  rate 


PERCENT  VC  BLOCKING  vs  Arrival  rale 


Figure  9:  Percent  VC  blocking  versus  arrival  rate  for  Network  1. 


6.2  Results  for  Network  2 

The  parameters  of  the  second  netv/ork,  referred  to  as  Network  2,  are  the  same  as  the  parameters 
Oj  Network  1.  However,  a  dinerent  seed  is  used  for  the  random  number  generation,  resulting  in  a 
different  topology  and  distribution  of  source-destination  end-system  pairs  for  the  VCs. 

We  again  take  C  =  20.  and  we  fix  1/^  at  15000.  Our  evaluation  measures  were  computed  for 
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a  set  of  100,000  VC  setup  requests.  Table  2,  and  Figures  10  and  11  show  the  results.  Similar 
conclusions  to  Network  1  hold  for  Network  2.  An  interesting  exception  is  that  at  high  values  of  A, 
we  observe  that  the  vertez-extension  scheme  performs  slightly  better  than  the  vertex-extension- QT 
scheme  (about  4.2%  higher  carried  VC  load).  The  reason  is  the  following:  Adding  QT  gives  richer 
merged  views,  and  hence  increases  the  chance  of  finding  a  feasible  path  that  is  possibly  long.  As 
explained  in  Section  6.1,  this  results  in  performance  degradation. 


Scheme 

Precinct  Size 

Merged  View  Size 

No.  of  Viewservers  Queried 

base 

4  /  16.32  /  33 

4  /  57.61  /  80 

1  /  5.52  /  6 

hase-QT 

4  /  16.32  /  33 

30  /  60.64  /  80 

6  /  6.00  /  6 

vertex- extension 

17/  90.36  /  282 

16  /  159.70  /  214 

1  /  5.52  /  6 

vertex- extension- Q  T 

17  /90.36  /  282 

113  /  166.97  /  214 

6  /  6.00  /  6 

Table  2:  Precinct  sizes,  merged  view  sizes,  and  number  of  viewservers  queried  for  Network  2. 


We  have  repeated  the  above  evaluations  for  other  networks  and  obtained  similar  cuiiciusions. 

7  Conclusions 

We  presented  a  hierarchical  VC  routing  protocol  for  ATM-like  networks.  Our  protocol  satishes  QoS 
constraints,  adapts  to  dynamic  topology  changes,  and  scales  well  to  large  number  of  nodes. 

Our  protocol  uses  partial  views  maintained  by  viewservers.  The  viewservers  are  organized 
hierarchically.  To  setup  a  VC,  the  source  end-system  queries  viewservers  to  obtain  a  merged  view 
that  contains  itself  and  the  destination  end-system.  This  merged  view  is  then  used  to  compute  a 
source  route  for  the  VC. 

We  evaluated  several  viewserver  hierarchy  schemes  and  compared  them  to  the  simple  approach. 
Our  results  on  2764-node  networks  indicate  that  the  vertex-extension  scheme  performs  close  to  or 
better  than  the  simple  approach  in  terms  of  VC  carried  load  and  blocking  probability  over  a  wide 
range  of  real-time  workload.  It  also  reduces  the  amount  of  memory  requirement  by  up  to  two  order 
of  magnitude.  We  note  that  our  protocol  scales  even  better  on  larger  size  networks  [3]. 

In  all  the  viewserver  schemes  we  studied,  each  switch  is  a  viewserver.  In  practice,  not  all 
switches  need  to  be  viewservers.  We  may  associate  one  viewserver  with  a  group  of  switches:  This  is 
particularly  attractive  in  ATM  networks  where  each  signaling  entity  is  responsible  for  establishing 
VCs  across  a  group  of  nodes.  In  such  an  environment,  viewservers  and  signaling  entities  can  be 
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CARRIED  VC  LOAD  vs  Arrival  laic 


Figure  10:  Carried  VC  load  versus  arrival  rate  for  Network  2. 


PERCENT  VC  BLOCKING  vs  Anival  rale 


Figure  11:  Percent  VC  blocking  versus  arrival  rate  for  Network  2. 

combined. 

However,  there  is  an  advantage  of  each  switch  being  a  viewserver;  that  is,  source  nodes  do  not 
require  fixed  source  routes  to  their  parent  viewservers  (in  the  view-querj’  protocol).  This  reduces 
the  amount  of  hand  configuration  required.  In  fact,  the  base  and  6ase-QJ  viewserver  schemes  do 
not  require  any  hand  configuration. 

Our  evaluation  model  assumed  that  views  are  instantaneously  updated,  i.e.  no  delayed  feedback 
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between  link  cost  changes  and  view/route  changes.  We  plan  to  investigate  the  effect  of  delayed  feed¬ 
back  on  the  performance  of  the  different  schemes.  We  expect  our  viewserver  schemes  to  outperform 
the  simple  approach  in  this  realistic  setting  as  the  update  of  views  of  the  viewservers  requires  less 
time  and  communication  overhead.  Thus,  views  in  our  viewserver  schemes  will  be  more  up-to-date. 

As  we  pointed  out  in  [3],  the  only  drawback  of  our  protocol  is  that  to  obtain  a  source  route 
for  a  VC,  views  are  merged  at  (or  prior  to)  the  VC  setup,  thereby  increasing  the  setup  time.  This 
drawbad:  is  not  unique  to  our  scheme  [8, 16,  7,  11).  Reference  [3]  describes  several  ways,  including 
cacheing  and  replication,  to  reduce  the  setup  overhead  and  improve  performance. 

0 
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Abstract 

TtaditionaJ  intei-domain  routing  protocols  based  on  superdomains  maintain  either  “strong” 
or  “weak”  ToS  and  poHcy  constraints  for  each  visible  superdomain.  With  strong  constraints, 
a  valid  path  may  not  be  found  even  though  one  exists.  With  weak  constraints,  an  invalid 
domain-level  path  may  be  treated  as  a  valid  path. 

We  present  an  inter-domain  routing  protocol  based  on  superdomains,  which  always  finds 
a  valid  path  if  one  exists.  Both  strong  and  weak  constraints  are  maintained  for  each  visible 
superdomain.  If  the  strong  constraints  of  the  superdomains  on  a  path  are  satisfied,  then  the 
path  IS  valid.  If  only  the  weak  constraints  are  satisfied  for  some  superdomains  on  the  path,  the 
source  uses  a  query  protocol  to  obtain  a  more  detailed  “internal”  view  of  these  superdomains 
and  seardes  again  for  a  valid  path.  Our  protocol  handles  topologj’  changes,  including  node/link 
failures  that  partition  superdomains.  Evaluation  results  indicate  our  protocol  scales  well  to  large 
internetworks.  ® 


Categories  and  Subject  Descriptors:  C.2.1  [Computer-Communication  Networks]:  Network  Archi¬ 
tecture  and  Design— paciet  networks;  start  and  forward  networks:  C.2.2  [Computer-Communication  Net¬ 
works]:  Network  Protocols— proioco/  arckiiectvre;  C.2.m  [Routing  Protocols];  F.2.m  [Computer  Network 
Routing  Protocols]. 


Tks  work  IS  sup^ned  in  part  by  ARPA  and  Philips  Labs  under  contract  DASG60-92-0055  to  Department 
of  Computer  Soence  University  of  Maryland,  and  by  National  Science  Foundation  Grant  No.  NCR  89-04590  The 
views,  opinion,  and/or  finings  contained  in  tks  report  are  those  of  the  author(s)  and  should  not  be  interpreted  as 
pohaes,  either  expressed  or  implied,  of  the  Advanced  Research  Projects  Agency,  PL,  NSF, 
or  the  U.S.  Government.  Computer  facilities  were  provided  in  part  by  NSF  grant  CCR-8811954. 
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type- of -service  (ToS)  constraints  of  applications  (e.g.  low  delay,  high  throughput,  high  reliability, 
minimum  monetary  cost),  each  node  maintains  a  cost  for  each  outgoing  link  and  ToS.  The  intra- 
domain  routing  protocol  should  choose  optimal  paths  based  on  these  costs. 

Across  all  domains,  an  inter- domain  routing  protocol  is  executed  that  provides  routes  between 
source  and  destination  nodes  in  different  domains,  using  the  services  of  the  intra-domain  routing 
protocols  within  domains.  This  protocol  should  have  the  following  properties; 

(1)  It  should  satisfy  the  policy  constraints  of  domains.  To  do  this,  it  must  keep  track  of  the 
policy  constraints  of  domains  [5). 

(2)  An  inter-domain  routing  protocol  should  also  satisfy  ToS  constraints  of  applications.  To  do 
this,  it  must  keep  track  of  the  ToS  services  offered  by  domains  [5]. 

(3)  An  inter-domain  routing  protocol  should  scale  up  to  very  large  internetworks,  i.e.  with  a  very 
large  number  of  domains.  Practically  this  means  that  processing,  memory  and  communication 
requirements  should  be  much  less  than  linear  in  the  number  of  domains.  It  should  also 
handle  non-hierarchical  domain  interconnections  at  any  level  [8]  (e.g.  we  do  not  want  to 
hand-configure  special  routes  as  “back-doors”). 

(4)  An  inter-domain  routing  protocol  should  automatically  adapt  to  link  cost  changes  and  node/link 
failures  cind  repairs,  including  failures  that  partition  domains  [13]. 

A  Straight-Forward  Approach 

A  straight-forward  approach  to  inter-domain  routing  is  domain-level  source  routing  with  link-state 
approach  [7,  5].  In  this  approach,  each  router^  maintains  a  domain-level  view  of  the  internetwork, 
i.e.,  a  graph  with  a  vertex  for  every  domain  and  an  edge  between  every  two  neighbor  domains. 
Policy  and  ToS  information  is  attached  to  the  vertices  and  the  edges  of  the  view. 

When  a  source  node  needs  to  reach  a  destination  node,  it  (or  a  router'*  in  the  sotirce’s  domain) 
first  examines  this  view  and  determines  a  domain-level  source  route  satisfying  ToS  and  policy 
constraints,  i.e.,  a  sequence  of  domain  ids  starting  from  the  source’s  domain  and  ending  with  the 
destination’s  domain.  Then  packets  are  routed  to  the  destination  using  this  domain-level  source 
route  and  the  intra-dotnain  routing  protocols  of  the  domains  crossed. 

For  example,  consider  the  internetwork  of  Figure  2  (each  circle  is  a  domain,  and  each  thin  line 

^  Not  all  nodes  maintain  routing  tables.  A  router  is  a  node  that  maintains  a  routing  table. 

'  referred  to  as  the  policy  server  in  [7] 
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1  Introduction 


A  computer  internetwork,  such  as  the  Internet,  is  an  interconnection  of  backbone  networks,  regional 
networks,  metropolitan  area  networks,  and  stub  networks  (campus  networks,  office  networks  and 
other  small  networks)’.  Stub  networks  are  the  producers  and  consumers  of  the  internetwork  traffic, 
while  backbones,  regionals  and  MANs  are  trsmsit  networks.  Most  of  the  networks  in  an  internetwork 
are  stub  networks.  Each  network  consists  of  nodes  (hosts,  routers)  and  links.  A  node  that  has  a 
link  to  a  node  in  another  network  is  called  a  gateway.  Two  networks  are  neighbors  when  there  is 
one  or  more  links  between  gateways  in  the  two  networks  (see  Figure  1). 


Figure  1:  A  portion  of  an  internetwork.  (Circles  represent  stub  networks.) 

An  internetwork  is  organized  into  domain^.  A  domain  is  a  set  of  networks  (possibly  consisting 
of  only  one  network)  administered  by  the  same  agency.  Domains  are  typically  subject  to  policy 
constraints,  which  are  administrative  restrictions  on  inter-domain  traffic  [7,  11,  8,  oj.  The  policy 
constraints  of  a  domain  U  are  of  two  types:  transit  policies,  which  specify  how  other  domains 
can  use  the  resources  of  U  (e.g.  SO.Ol  per  packet,  no  traffic  from  domain  V);  and  source  policies, 
which  specify  constraints  on  traffic  originating  from  U  (e.g.  domains  to  avoid/prefer,  acceptable 
connection  cost).  Transit  policies  of  a  domain  are  public  (i.e.  available  to  other  domains),  whereas 
source  policies  are  usually  private. 

^^'ithin  each  domain,  an  intra~domain  routing  protocol  is  executed  that  provides  routes  between 
source  and  destination  nodes  in  the  domain.  This  protocol  can  be  any  of  the  tj'pical  ones,  i.e., 
next-hop  or  source  routes  computed  using  distance-vector  or  link-state  algorithms.  To  satisfy 

^  For  example,  hSFNEX,  MILNEX  backbones,  &itd  Surdjict,  Cerllsct  are  re^onals. 

Also  referred  to  as  routing  domains  or  administrative  domains. 
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is  a  domain-level  interconnection).  Suppose  a  node  in  d\  desires  a  connection  to  a  node  in  dl. 
Suppose  the  policy  constraints  of  d3  and  dl9  do  not  aUow  transit  traffic  originating  from  dl.  Every 
node  maintains  this  information  in  its  view.  Thus  the  source  node  can  choose  a  valid  path  from 
source  domain  dl  to  destination  domain  dl  avoiding  d3  and  dl9  (e.g.  thick  line  in  the  figure). 


Figure  2:  An  example  interdomain  topology. 

The  disadvantage  of  this  straightforward  scheme  is  that  it  does  not  scale  up  for  large  internet¬ 
works.  The  storage  at  each  router  is  proportional  to  Nd  x  Ed,  where  Nd  is  the  number  of  domains 
and  Ed  is  the  average  number  of  neighbor  domains  to  a  domaun.  The  communication  cost  for 
updating  views  is  proportional  to  Nr  x  Er,  where  Nr  is  the  number  of  routers  in  the  internetwork 
and  Er  is  the  average  router  neighbors  of  a  router  (topology  changes  are  flooded  to  all  routers  in 
the  internetwork). 

The  Superdomain  Approach 

To  achieve  scaling,  several  approaches  based  on  hierarchically  aggregating  domains  into  superdo¬ 
mains  have  been  proposed  [16.  14,  6].  Here,  each  domain  is  a  level  1  superdomain,  “close”  level  1 
superdomains  are  grouped  into  level  2  superdomains,  “close”  level  2  superdomains  are  grouped  into 
level  3  superdomains,  and  so  on  (see  Figure  3).  Each  router  x  maintains  a  view  that  contains  the 
level  1  superdomains  in  I’s  level  2  superdomain,  the  level  2  superdomains  in  I’s  level  3  superdomain 
(excluding  the  I’s  level  2  superdomain),  and  so  on.  Thus  a  router  maintains  a  smaller  view  than 
it  would  in  the  absence  of  hierarchy.  For  the  superdomain  hierarchy  of  Figure  3,  the  views  of  two 
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Figure  3:  An  example  of  super  domain  hierarchy, 
routers  (one  in  domain  dl  and  one  in  domain  dl6)  are  shown  in  Figures  4  and  5. 


Figure  4.  View  of  a  router  in  dl.  Figure  5:  View  of  a  jew^erin  dl6. 

The  superdomain  approach  has  several  problems.  One  problem  is  that  the  aggregation  results 
in  loss  of  domain-level  ToS  and  policy  information.  A  superdomain  is  usually  characterized  by  a 
single  set  of  ToS  and  policy  constraints  derived  from  the  ToS  and  policy  constraints  of  the  domains 
in  n.  Routers  outside  the  superdomain  assume  that  this  set  of  constraints  applies  uniformly  to 
each  of  its  children  (and  by  recursion  to  each  domain  in  the  superdomain).  Ti  there  axe  domains 
with  different  (possibly  contradictory)  constraints  in  a  superdomain,  then  there  is  no  good  way  of 
deriving  the  ToS  and  policy  constraints  of  the  superdomain. 

The  usual  technique  [16]  of  obtaining  ToS  and  policy  constraints  of  a  superdomain  is  to  obtain 
either  a  strong  set  of  constraints  or  a  weak  set  of  constraints®  from  the  ToS  and  policy  constraints  of 
‘  ‘•strong”  and  “weak”  are  referred  to  respectively  as  “union”  and  “intersection*  in  [16] 
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the  childreD  superdomains  in  it.  If  strong  (weak)  constraints  are  used  for  polides,  the  superdomain 
enforces  a  policy  constraint  if  that  policy  constraint  is  enforced  by  some  (all)  of  its  children.  If 
strong  (weak)  constraints  are  used  for  ToS  constraints,  the  superdomain  is  assumed  to  support  a 
ToS  if  that  ToS  is  supported  by  all  (some)  of  its  children.  The  intention  is  that  if  strong  (weak) 
constraints  of  a  superdomain  are  (axe  not)  satisfied  then  any  (no)  path  through  that  superdomain 
is  valid. 

Each  approach  has  problems.  Strong  constraints  can  eliminate  valid  paths,  and  weak  constraints 
can  allow  invalid  paths.  For  example  in  Figure  3,  dl6  allows  transit  traffic  from  dl  while  (il9  does 
not:  with  strong  constraints  G  would  not  allow  transit  traffic  from  dl,  and  with  weak  constraints 
G  would  allow  transit  traffic  from  dl  to  be  routed  via  dl9. 

Other  problems  of  the  superdomain  approach  are  that  the  varying  visibilities  of  routers  compli¬ 
cates  superdomain-level  source  routing  and  handling  of  node/link  failures  (especially  those  that  par¬ 
tition  superdomains).  The  usual  technique  for  solving  these  problems  is  to  augment  superdomain- 
level  views  with  gateways  [16]  (see  Section  3). 

Our  Contribution 

In  this  paper,  we  present  an  inter-domain  routing  protocol  based  on  superdomains,  which  finds 
a  valid  path  if  and  only  if  one  exists.  Both  strong  and  weak  constraints  are  maintained  for  each 
visible  superdomain.  If  the  strong  constraints  of  the  superdomains  on  a  path  are  satisfied,  then 
the  path  is  valid.  If  only  the  weak  constraints  are  satisfied  for  some  superdomains  on  the  path,  the 
source  uses  a  query  protocol  to  obtain  a  more  detailed  “internal"  view  of  these  superdomains,  and 
searches  again  for  a  valid  path. 

We  use  superdomain-level  views  with  gateways  and  a  link-state  view  update  protocol  to  handle 
topology  changes  including  failures  that  partition  superdomains.  The  storage  cost  is  x 

loglV'p)  without  the  query  protocol.  We  demonstrate  the  scaling  properties  of  the  query  protocol 
by  giving  evaluation  results  based  on  simulations.  Our  evaluation  results  indicate  that  the  query 
protocol  can  be  performed  using  15%  extra  space. 

Our  protocol  consists  of  two  subprotocols:  a  view-query  protocol  for  obtaining  views  of 
greater  resolution  when  needed:  and  a  view-update  protocol  for  disseminating  topology  changes 
to  the  views. 
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Several  approaches  to  scalable  inter-domain  routing  have  been  proposed,  based  on  the  super¬ 
domain  hierarchy  [1,  14,  16,  9,  6],  and  the  landmark  hierarchy  [18,  17].  Some  of  these  approaches 
suffer  from  loss  of  ToS  and  policy  information  (and  hence  may  not  ffnd  a  valid  path  which  exists). 
Others  are  still  in  a  preliminary  stage.  (Details  in  Section  8.) 

One  important  difference  between  these  approaches  and  ours  is  that  ours  uses  a  query  mechanism 
to  obtain  ToS  and  policy  details  whenever  needed.  In  our  opinion,  such  a  mechamism  is  needed 
to  obtain  a  scalable  solution.  Query  protocols  are  also  being  developed  to  enhance  the  protocols 
in  [9,  6].  Reference  [2]  presents  protocols  based  on  a  new  kind  of  hierarchy,  referred  to  as  the 
viewserver  hierarchy  (more  details  in  Section  8). 

A  preliminary  version  of  the  view-query  protocol  was  proposed  in  reference  [1].  That  version 
differs  greatly  from  the  one  in  this  paper.  Here,  we  augment  superdomain-level  views  with  gate¬ 
ways.  In  [1],  we  augmented  superdomain-level  views  with  superdomain-to-domain  edges  (details  in 
Section  8).  Both  versions  have  the  same  time  and  space  complexity,  but  the  protocols  in  this  paper 
are  much  simpler  conceptually.  Also  the  view-update  protocol  is  not  in  reference  [1]. 

Organization  of  the  paper 

In  Section  2,  we  present  some  definitions  used  in  this  paper.  In  Section  3,  we  define  the  view  data 
structures.  In  Section  4,  we  describe  how  view's  are  affected  by  topology  changes.  In  Section  5,  we 
present  the  view-query  protocol.  In  Section  6,  we  present  the  view-update  protocol.  In  Section  7, 
we  present  our  evaluation  model  and  the  results  of  its  application  to  the  superdomain  hierarch}'. 
In  Section  8,  we  survey  recent  approaches  to  inter-domain  routing.  In  Section  9,  we  conclude  and 
describe  cacheing  and  heuristic  schemes  to  improve  performance. 

2  Preliminaries 

Each  domain  has  a  unique  id.  Let  Domainlds  denote  the  set  of  domain-ids.  Each  node  has  a 
unique  id.  Let  Nodelds  denote  the  set  of  node-ids.  For  a  node  z,  we  use  doiDain.id(z)  to  denote 
the  domain-id  of  I’s  domain. 

The  superdomain  hierarchy  defines  the  following  parent-child  relationship:  a  level  i,  i  >  1, 
superdomain  is  the  parent  of  each  level  i  —  1  superdomain  it  contains.  Top-level  superdomains 
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have  no  parents.  Level  1  superdomains,  which  are  just  domadns,  have  no  children.  For  any  two 
superdomains  X  and  Y,  X  is  a  sibling  of  Y  iff  A”  and  Y  have  the  same  parent.  X  is  an  ancestor 
(descendant)  of  Y  iff  A'  =  Y  or  A  is  an  ancestor  (descendant)  of  Y’s  parent  (child). 

Each  router  maintains  information  about  a  subset  of  superdomains,  referred  to  as  its  visible 
superdomains.  The  visible  superdomains  of  a  router  x  are  (1)  I’s  domain  itself,  (2)  siblings  of  x’s 
domain,  and  (3)  siblings  of  ancestors  of  I’s  domain.  In  Figure  3,  the  visible  superdomains  of  a 
router  in  dl  are  dl,d2,d2,B,C,G,J  (these  are  shown  in  Figure  4).  Note  that  if  a  superdomain  U 
is  visible  to  a  router,  then  no  ancestor  or  descendant  of  U  is  visible  to  the  router. 

Each  superdomain  has  a  unique  id,  i.e.  unique  among  all  superdomains  regardless  of  level.  Let 
SuperDomainlds  denote  the  set  of  superdomain-ids.  Doaainids  is  a  subset  of  SuperDomainlds. 
For  a  superdomain  V ,  let  level(17)  denote  the  level  of  JJ  in  the  hierarchy,  let  Ancestors(Z7)  denote 
the  set  of  ids  of  ancestor  superdomains  of  U  in  the  hierarchy,  and  let  Children(C/)  denote  the  set 
of  ids  of  child  superdomains  of  U  in  the  hierarchy. 

For  a  router  x,  let  VisibleSuperDoinains(x)  denote  the  set  of  ids  of  superdomains  visible  from 

X. 

We  extend  the  above  definitions  by  allowing  their  arguments  to  be  nodes,  in  which  case  the  node 
stands  for  its  domain.  For  example,  if  x  is  anode  in  domain  d,  Ancestors(x)  denotes  Ancesxors(d). 

3  Superdomain-Level  Views  with  Gateways 

For  routing  purposes,  each  domain  (and  node)  has  an  address,  defined  as  the  concatenation  of  the 
superdomain  ids  starting  from  the  top  level  and  going  down  to  the  domain  (node).  For  example  in 
Figure  3,  the  address  of  domain  dlo  is  G.E.dlb,  and  the  address  of  a  node  h  in  dl5  is  G.E.dlb.h. 

When  a  source  node  needs  to  reach  a  destination  node,  it  first  determines  the  visible  superdo¬ 
main  in  the  destination  address  tmd  then  by  examining  its  view  determines  a  superdomain-level 
source  route  (satisfying  ToS  and  policy  constraints)  to  this  superdomain.  However,  since  routers 
in  different  superdomains  maintain  views  of  different  sets  of  superdomains,  this  superdomain-level 
source  route  can  be  meaningless  at  some  intermediate  superdomain’s  router  x  because  the  next 
superdomain  in  this  source  route  is  not  visible  to  x.  For  example  in  Figure  4,  superdomain-level 
source  route  {d2,B,G,C)  created  at  a  router  in  d2  becomes  meaningless  once  the  packet  is  in  G, 
where  C  is  not  visible. 
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The  usual  technique  of  solving  this  problem  is  to  augment  superdomain-level  views  with  gate¬ 
ways  and  edges  between  these  gateways. 

Define  the  pair  U:g  to  be  an  sd-gaieway  iff  17  is  a  superdomain  and  g  is  anode  that  is  in  U  and 
has  a  link  to  a  node  outside  V.  Equivalently,  we  say  that  p  is  c  gateway  ofU. 

Define  (t/:p,h)  to  be  cin  actual-edge  iff  77 :p  is  an  sd-gateway,  is  a  gateway  not  in  77,  and  there 
is  a  link  from  g  to  h. 

Define  (77:p,h)  to  be  a  virtual-edge  iff  U:g  and  U:h  are  sd-gateways  and  g  ^  h  (note  that  there 
may  not  be  a  hnk  between  g  and  h). 

{U:g,h)  is  an  edge  iff  it  is  an  actual-edge  or  a  virtual-edge.  An  edge  {U:g,h)  is  also  said  to  be 
an  outgoing  edge  of  U:g.  Define  edges  ofU:g  to  be  the  set  of  edges  outgoing  from  U:g.  Define  edges 
of  U  to  be  the  set  of  edges  outgoing  from  any  gateway  of  77. 

Let  Gateuays(77)  denote  the  set  of  node-ids  of  gateways  of  77.  Let  Edges(77:p)  denote  the  edges 
of  U:g.  Note  that  we  never  use  “edge”  as  a  synonym  for  link. 

A  gateway  p  of  a  domain  can  generate  many  sd-gateways,  specifically,  77 :p  for  every  ancestor  77 
of  p’s  domain  such  that  p  has  a  link  to  a  node  outside  77.  A  link  {g,h)  where  p  and  h  are  gateways 
in  different  domains,  can  generate  many  actual-edges;  specifically,  actual-edge  {U:g,h)  for  every 
ancestor  U  of  p’s  domain  such  that  U  is  not  an  ancestor  of  /I’s  domain. 

For  the  internetwork  topolog}'  of  Figure  2,  the  corresponding  gateway-level  connections  are 
shown  in  Figure  6  where  black  rectangles  are  gateways.  For  the  hierarchy  of  Figure  3,  gateway 
p  in  Figure  6  generates  sd-gatew-ays  dl6:p,  E:p,  and  G:g.  The  link  (p,h)  in  Figure  6  generates 
actual-edges  {dl6:g.h),  {E:g,h),  {G:g,h). 

To  a  router,  at  most  one  of  the  sd-gatew’ays  generated  by  a  gateway  p  is  visible,  namely  U :p 
where  U  is  an  ancestor  of  p’s  domain  and  U  is  visible  to  the  router.  At  most  one  of  the  actual-edges 
generated  by  aBnk  (p,  h)  between  two  gateways  in  different  domains  is  visible  to  the  router,  namely 
edge  (77:p,7i)  where  77:p  is  visible  to  the  router.  None  of  the  actual-edges  are  visible  to  the  router 
if  p  and  h  are  inside  a  visible  superdomadn.  For  example  in  Figure  3,  of  the  actual-edges  generated 
by  link  (p,  h),  only  {G:g,  h)  is  visible  to  a  router  in  dl,  and  only  (dl6:p,  h)  is  visible  to  a  router  in 

die. 

A  router  maintains  a  view  consisting  of  the  visible  sd-gatew'ays  and  their  outgoing  actual-  and 
virtual-edges.  An  edge  {U:g.h)  in  the  view  of  a  router  connects  the  sd-gateway  77 :p  to  the  sd- 
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Fig'are  6:  Gateway-level  connections  of  internetwork  of  Figure  2. 

gateway  V :h  such  that  V  :h  is  visible  to  the  renter.  For  the  snperdomain-level  views  of  Fignres  4 
and  5.  the  new  views  are  shown  in  Figures  7  and  8,  respectively. 


Figure  7:  View  of  a  router  in  dl.  Figure  8:  View  of  a  router  in  dl6. 


The  view  of  a  router  x  contains,  for  each  superdomain  U  that  is  visible  to  z  or  is  an  ancestor 
of  2,  the  strong  and  weak  constraints  of  U  and  a  set  referred  to  as  Gaieways&Edges^{U),  This 
set  contains,  for  each  gatew^ay  y  of  [/,  the  edges  of  U:y  and  their  costs.  The  reason  for  storing 
information  about  ancestor  super  domains  is  given  in  Section  5.  The  cost  field  is  used  to  satisfy  ToS 
constraints  and  is  described  in  Section  4.  The  time  stamp  field  is  described  in  Section  6.  Formally, 
the  view  of  x  is  defined  as  follow’s: 
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V  ieti/x.  View  of  i. 

=  {(i/,  strong^constraints(t/),  weaLk_constraints(C/),  Gateways&Edges^{U))  : 

U  £  VisibleSuperDoinaixLs(x)  U  Aiicestors(x)  } 

where 

Gaieway$&Edges^{U),  Sd-gateways  and  edges  of  U. 

=  {(j/,  timestamp^  {(z,  cost)  :  {U:y^z)  £  Edges{U :y)})  :  y£  Gateways([/)  }. 

ToS  and  policy  constraints  can  also  be  specified  for  each  sd-gateway  and  edge.  Our  protocols 
can  be  extended  to  handle  such  constraints,  but  we  have  not  done  so  here  in  order  to  keep  their 
descriptions  simple. 

A  super  domain- level  source  route  is  now  a  sequence  of  sd-gateway  ids.  With  this  definition,  it 
is  easy  to  verify  that  whenever  the  next  superdomain  in  a  superdomain-level  source  route  is  not 
visible  to  a  router,  there  is  an  actual-edge  (hence  a  link)  between  the  router  and  the  next  gateway 
in  this  route. 

4  Edge-Costs  and  Topology  Changes 

A  cost  is  associated  with  each  edge.  The  cost  of  an  edge  equals  a  vector  of  •values  if  the  edge  is  up; 
each  cost  value  indicates  how  expensive  it  is  to  cross  the  edge  according  to  some  ToS  constraiDt. 
The  cost  equals  cc  if  the  edge  is  an  actual-edge  and  it  is  down,  or  the  edge  is  a  virtual-edge  {U:g,  h) 
and  h  can  not  be  reached  from  g  without  leaving  U. 

Since  an  actual-edge  represents  a  physical  link,  its  cost  can  be  determined  from  measured  link 
statistics.  The  cost  of  a  virtual-edge  {JJ:g,h)  is  an  aggregation  of  the  cost  of  physical  links  in 
V  and  is  calctilated  as  follows:  If  1/  is  a  domain,  the  cost  of  {U:g,h)  is  calculated  as  the  maxi- 
mum/minimum/average  cost  of  the  routes  within  U  from  g  to  h  [4].  For  higher  level  superdomains 
U,  the  cost  of  {U:g,h)  is  derived  from  the  costs  of  edges  between  the  gateways  of  children  super¬ 
domains  of  U. 

Link  cost  changes  and  link/node  failures  and  repairs  correspond  to  cost  changes,  failures  and 
repairs  of  actual-  and  virtual-edges.  Thus  the  attributes  of  edges  in  the  views  of  routers  must  be 
reg-ularly  updated.  For  this,  we  employ  a  view-update  protocol  (see  Section  6). 
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Link/node  failures  cam  also  partition  a  superdomain  into  cells,  where  a  cell  of  a  superdomain 
is  defined  to  be  a  maximal  subset  of  nodes  of  the  superdomain  that  can  reach  each  other  without 
leaving  the  superdomain.  Superdomain  partitions  can  occur  at  any  level  in  the  hierarchy.  For 
example,  suppose  U  is  a  domain  and  V  is  its  parent  superdomain.  U  can  be  partitioned  into  cells 
without  V  being  partitioned  (i.e.  if  the  cells  of  U  can  reach  each  other  without  leaving  V).  The 
opposite  can  also  happen:  if  all  links  between  U  and  the  other  children  of  V  fail,  then  V  becomes 
partitioned  but  U  does  not.  Or  both  C/  and  V  can  be  partitioned.  In  the  same  way,  link/node 
repairs  can  merge  cells  into  bigger  cells. 

We  handle  superdomain  partitioning  as  follows:  A  router  detects  that  a  superdomain  U  is 
partitioned  when  a  virtual-edge  of  U  in  the  router’s  view  has  cost  oo.  When  a  router  forwards 
a  packet  to  a  destination  for  which  the  visible  superdomain,  say  U,  in  the  destination  address  is 
partitioned  into  cells,  a  copy  of  the  packet  is  sent  to  each  cell  by  sending  a  copy  of  the  packet  to 
each  gateway  of  U ;  the  id  V  in  the  destination  address  is  “marked”  in  the  packet  so  that  subsequent 
routers  do  not  create  new  copies  of  the  packet  for  U. 

5  View-Query  Protocol 

When  a  source  node  wants  a  superdomain-level  source  route  to  a  destination,  a  router  in  its  domain 
examines  its  view  and  searches  for  a  valid  path  (i.e.  superdomain-level  source  route)  using  the 
destination  address®.  We  refer  to  this  router  as  the  source  router.  Even  though  the  source  router 
does  not  know  the  constraints  of  the  individual  domains  that  axe  to  be  crossed  in  each  superdomain, 
it  does  know  the  strong  and  weak  constraints  of  the  superdomains.  We  refer  to  a  superdomain 
whose  strong  constraints  are  satisfied  as  a  valid  superdomain.  If  a  superdomain’s  weak  constraints 
axe  satisfied  but  strong  constraints  are  not  satisfied,  then  there  may  be  a  valid  path  through  this 
superdomain.  We  refer  to  such  a  superdomain  as  a  candidate  superdomain. 

A  path  is  valid  if  it  involves  only  valid  superdomains.  A  path  cannot  be  vahd  if  it  involves 
a  superdomain  which  is  ndther  valid  nor  candidate.  We  refer  to  a  path  involving  only  valid  and 
candidate  superdomains  as  s.  candidate  path. 

*  We  assume  that  the  source  has  the  destination’s  address.  3f  that  is  not  the  case,  it  would  first  query  the  name 
servers  to  obtain  the  address  for  the  destination.  Querying  the  name  servers  can  be  done  the  same  way  it  is  done 
currently  in  the  Internet.  It  requires  nodes  to  have  a  set  of  fixed  addresses  to  name  servers.  This  is  also  sufficient  in 
our  case. 
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If  the  source  router’s  view  contains  a  candidate  path  {Uo'-goo  >••••>  Uo'-go^^ ,  V’l  , . . . ,  t/’i  :gi^^ ,  •  •  •  , 
^m'-gmoi  •  •  •  5  Um'gmnn )  destination  (and  does  not  contain  a  valid  path),  then  for  each  candi¬ 

date  super  domain  Ui  on  this  path,  the  source  router  queries  gateway  g{^  of  Ui  for  the  internal  view  of 
Ui-  This  internal  view  consists  of  the  constraints,  sd-gateways  and  edges  of  the  child  superdomains 
of  Ui. 

When  a  router  x  receives  a  request  for  the  internal  view  of  an  ancestor  superdomain  17,  it 
returns  the  following  data  structure: 

IViewx{U).  Internal  view  of  U  at  router  x. 

=  {(F,  strong_constraints(V),  ueak_constraints(y),  Gateways&Edges^{V))  e  View^  : 

V  €  Children(l/)} 

It  is  to  simplify  the  construction  of  [VieWx(U)  that  we  store  information  about  ancestor  su¬ 
perdomains  in  the  view  of  router  x.  Instead  of  storing  this  information,  router  i  could  construct 
JViewxiU)  from  the  constraints,  sd-gateways  and  edges  of  the  visible  descendants  of  U.  We  did 
not  choose  this  alternative  because  the  extra  information  does  not  increase  storage  complexitv. 

When  the  source  router  receives  the  internal  view  of  a  superdomain  U,  it  does  the  following: 
(1)  it  removes  the  sd-gateways  and  edges  of  U  from  its  view;  (2)  it  adds  the  sd-gateways  and  edges 
of  children  superdomains  in  the  internal  view  of  U;  and  (3)  searches  for  a  valid  path  again.  If  there 
is  stiU  no  valid  path  but  there  are  candidate  paths,  the  process  is  repeated. 

For  example,  consider  Figure  3.  For  a  router  in  superdomain  dl  (see  Figure  7),  G  is  visible  and 
is  a  candidate  domain.  The  internal  view  of  G  is  shown  in  Figure  9,  and  the  resulting  merged  view 
is  shown  in  Figure  10.  The  vahd  path  through  G  (visiting  dl6  and  avoiding  dl9)  can  be  discovered 
using  this  merged  view  (since  the  strong  constraints  of  E  are  satisfied). 

Consider  a  candidate  route  to  a  destination:  {Uozgoo,  ■  -  Uo'.go^ ,  Uizgi^, . . . ,  Uyigi^^ ,  •  •  •  , 
Um’gmcii  •  •  • » ^^771  •Srr.nre ) •  ^  superdomaiu  Ui  is  partitioned  into  cells,  it  may  re-appear  later  in  the 
candidate  path  (i.e.  for  some  j  i,  Uj  =  17,).  In  this  case  both  gateways  gi^  and  are  queried. 
Timestamps  are  used  to  resolve  conflicts  between  the  information  reported  by  these  gateways. 

The  view-query  protocol  uses  two  types  of.messages  as  follows: 

•  {'RequesxIVie'a,  sdid,  gid,  sjLddress,  djaddress) 
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Figure  9:  Internal  view  of  G.  Figure  10:  Merged  view'  at  dl. 

Sent  by  a  source  router  to  gateway  gid  to  obtain  the  internal  view  of  superdomain  sdid. 
s^ddress  is  the  address  of  the  source  router,  djuddtcss  is  the  address  of  the  destination 
node  (of  the  desired  route). 

•  (ReplylView,  sdid,  gid,  iview,  djxddress) 

where  iview  is  the  internal  view  of  superdomain  sdid,  and  other  parameters  are  as  in  the 
RequestIView  message.  It  is  sent  by  gateway  gid  to  the  source  router. 

The  state  maintained  by  a  source  router  x  is  listed  in  Figure  15.  P ending Req^  is  used  to 
avoid  sending  new  request  messages  before  receiving  all  outstanding  reply  messages.  WView^  and 
PendingReqx  are  allocated  and  deallocated  on  demand  for  each  destination. 

The  events  of  router  x  axe  specified  in  Figure  15.  In  the  figure,  *  is  a  wild-card  matching  any 
value.  TimeOuix  event  is  executed  after  a  time-out  period  from  the  execution  of  Request^  event  to 
indicate  that  the  request  has  not  been  satisfied.  The  source  host  can  then  repeat  the  same  request 
afterwards. 

The  procedure  seaTch^  uses  an  operation  “ReiiableSend(m)  to  u*,  w'here  m  is  the  message  being 
sent  and  v  is  either  an  address  of  an  arbitrary  router  or  an  id  of  a  gatewray  of  a  visible  superdomain. 
ReliableSend  is  asynchronous.  The  message  is  delivered  to  u  as  long  as  there  is  a  sequence  of  up 
links  between  u  and  v7  (Note  that  an  address  is  not  needed  to  obtain  an  inter-domain  route  to  a 
gateway  of  a  visible  superdomain.) 

Router  Failure  Model:  A  router  can  undergo  failures  and  recoveries  at  anytime.  We 
assume  failures  axe  fail-stop  (i.e.  a  faded  router  does  not  send  erroneous  messages).  When  a  router 
X  recovers,  the  variables  WViewx  and  Pending Req^  are  lost  for  all  destinations.  The  cost  of  each 
edge  in  Viewx  is  set  to  oc.  It  becomes  up-to-date  as  the  router  receives  new  information  from  other 
This  bvolves  time-outs,  retransmissions,  etc.  It  requires  a  transport  protocol  support  such  as  TCP. 
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routers. 


6  View-Update  Protocol 

A  gateway  5,  for  each  ancestor  superdomain  U ,  informs  other  routers  of  topology  changes  (i.e. 
fadlures,  repairs  and  cost  changes)  affecting  U:g^s  edges.  The  communication  is  done  by  flooding 
messages.  The  flooding  is  restricted  to  the  routers  in  the  parent  superdomain  of  U,  since  U  is 
visible  only  to  these  routers. 

Due  to  the  nature  of  flooding,  a  router  can  receive  information  out  of  order  from  a  gateway.  In 
order  to  avoid  old  information  replacing  new  information,  each  gateway  includes  increasing  time 
stamps  in  the  messages  it  sends.  Routers  maintain  for  each  gateway  the  highest  received  time 
stamp  (in  the  timestamp  field  in  View^),  and  discard  messages  with  smaller  timestamps.  Time 
stamps  do  not  have  to  be  real-time  clock  values. 

Due  to  superdomain  partitioning,  messages  sent  by  a  gateway  may  not  reach  all  routers  within 
the  parent  superdomain,  resulting  in  some  routers  having  out-of-date  information.  This  out-of-date 
information  can  cause  inconsistencies  when  the  partition  is  repaired.  To  eliminate  inconsistencies, 
when  a  link  recovers,  the  two  routers  at  the  ends  of  the  link  exchange  their  views  and  flood  any  new 
information.  As  usual,  information  about  a  superdomain  U  is  flooded  over  U's  parent  superdomain. 

The  view-update  protocol  uses  messages  of  the  following  form; 

•  (Update,  sdid,  gid,  timestamp,  edge-set) 

Sent  by  the  gateway  gid  to  inform  other  routers  about  current  attributes  of  edges  of  sdid'.gid. 
timestamp  indicates  the  time  stamp  of  gid.  edge-set  contains  a  cost  for  each  edge. 

The  state  maintained  by  a  router  x  is  listed  in  Figure  16.  Note  that  AdjLoccilRouters.  or 
AdjForeignGateuays^  can  be  empty.  IntraDomainRTx  contains  a  route  (next-hop  or  source)®  for 
every  reachable  node  of  the  domain.  We  assume  that  consecutive  reads  of  Clock-  returns  increasing 
%'alues. 

Routers  also  receive  and  flood  messages  containing  edges  of  sd-gateways  of  their  ancestor  su¬ 
perdomains.  This  information  is  used  by  the  query  protocol  (see  Section  5).  Also  the  highest 
Timestamp  received  from  a  gateway  p  of  an  ancestor  superdomain  is  needed  to  avoid  exchanging 

*  IntTaDomainBTx  is  a.  view  ia  case  of  a  link-state  routing  protocol  or  a  distance  table  in  case  of  a  distance-vector 
routing  protocol. 
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the  messages  of  g  infinitely  during  flooding. 

The  events  of  router  x  are  specified  in  Figure  16.  We  use  Ancestor,  (L/)  to  denote  the  superdomain- 
id  of  the  ith  ancestor  of  U,  where  Ancestoro(f/)  =  U.  In  the  view-update  protocol,  a  node  u  uses 
send  operations  of  the  form  “Send(m)  to  v”,  where  m  is  the  message  being  sent  and  v  is  the 
destination-id.  Here,  nodes  u  and  v  are  neighbors,  and  the  message  is  sent  over  the  physical  hnk 
(u,  v).  If  the  link  is  down,  we  assume  that  the  packet  is  dropped. 

7  Evaluation 

In  the  superdomain  hierarchy  (without  the  query  protocol),  the  number  of  superdomains  in  a  view 
is  logarithmic  in  the  number  of  superdomains  in  the  internetwork  [10].®  However,  the  storage 
required  for  a  view  is  proportional  not  to  the  number  of  superdomains  in  it  but  to  the  number  of 
sd-gateways  in  it.  As  we  have  seen,  there  can  be  more  than  one  sd-gateway  for  a  superdomain  in 
a  view*. 

In  fact,  the  superdomain  hierarchy  does  not  scale-up  for  arbitrary  internetworks;  that  is,  the 
number  of  sd-gateways  in  a  view  can  be  proportional  to  the  number  of  domains  in  the  internetwork. 
For  example,  if  each  domain  in  a  superdomain  U  has  a  distinct  gatew'ay  with  a  link  to  outside  J/ , 
the  number  of  sd-gateways  of  U  would  be  linear  in  the  number  of  domains  in  U . 

The  good  news  is  that  the  superdomain  hierarchy  does  scale-up  for  realistic  internetwork  topolo¬ 
gies.  A  sufficient  condition  for  scaling  is  that  each  superdomain  has  at  most  log  Nd  sd-gateways; 
this  condition  is  satisfied  by  realistic  internetworks  since  most  domain  interconnections  are  “hier¬ 
archical  connections”  i.e.  between  backbones  and  regionals,  between  regionals  and  MANs,  and  so 
on. 

In  this  section,  we  present  an  evaluation  of  the  scaling  properties  of  the  superdomain  hierarchy 
and  the  query  protocol.  To  evaluate  any  inter-domain  routing  protocol,  we  need  a  model  in  which 
we  can  define  internetwork  topologies,  pohcy/ToS  constraints,  inter- domain  routing  hierarchies, 
and  evaluation  measures  (e.g.  memory  and  time  requirements).  We  have  recently  developed  such 
a  model  [3].  We  first  describe  our  model,  and  then  use  it  to  evaluate  our  superdomain  hierarchy. 
Our  evaluation  measures  are  the  amount  of  memory  required  at  the  routers,  and  the  amount  of 

®  Even  thongL  the  results  in  [10]  were  lor  intra-domain  renting,  it  is  easy  to  show  that  the  analysis  there  holds 
lor  inter-domain  routing  as  well. 
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lime  needed  to  construct  a  path. 


7.1  Evaluation  Model 

We  first  describe  our  method  of  generating  topologies  and  policy/ToS  constraints.  We  then  describe 
the  evaluation  measures. 

Generating  Internetwork  Topologies 

For  our  purposes,  an  internetwork  topology  is  a  directed  graph  where  the  nodes  correspond  to 
domains  and  the  edges  correspond  to  domain-level  connections.  However,  an  arbitrary  graph  will 
not  do.  The  topology  should  have  the  characteristics  of  a  real  internetwork,  like  the  Internet. 
That  is,  it  should  have  backbones,  regionais,  MANS,  LANS,  etc.;  there  should^be  hierarchical 
connections,  but  some  “non-hierarchical”  connections  should  also  be  present. 

For  brevity,  we  refer  to  backbones  as  class  0  domains,  regionais  as  class  1  domains,  metropolitan- 
area  domains  and  providers  as  class  2  domains,  and  campus  and  local-area  domains  as  class  3 
domains.  A  (strictly)  hierarchical  interconnection  of  domains  means  that  class  0  domains  are 
connected  to  each  other,  and  for  i  >  0,  class  i  domains  are  connected  to  class  i  --  1  domains. 
As  mentioned  above,  we  also  want  some  “non- hierarchical”  connections,  i.e.,  domain-level  edges 
between  domains  irrespective  of  their  classes  (e.g.  from  a  campus  domain  to  another  campus 
domain  or  to  a  backbone  domain). 

In  reality,  domains  span  geographical  regions  and  domain-level  edges  are  often  between  do¬ 
mains  that  are  geographically  close  (e.g.  University  of  Maryland  campus  domain  is  connected  to 
SURANET  regional  domain  which  are  both  in  the  east  coast).  "We  also  w^ant  some  edges  that  are 
between  far  domains.  A  class  i  domain  usually  spans  a  larger  geographical  region  than  a  class  i  +  1 
domain.  To  generate  such  interconnections,  we  associate  a  “region”  attribute  to  each  domain.  The 
intention  is  that  two  domains  with  the  same  region  are  geographically  dose. 

The  region  of  a  dass  i  domain  has  the  form  ro.ri.--.ri,  where  the  rj’s  are  integers.  For 
example,  the  region  of  a  dass  3  domain  can  be  1.2. 3. 4.  For  brevity,  we  refer  to  the  region  of  a 
dass  i  domain  as  a  dass  i  region. 

Note  that  regions  have  their  own  hierarchy  which  should  not  be  confused  with  the  super  domain 
hierarchy.  Class  0  regions  aure  the  top  leveF regions.  We  say  that  a  class  i  region  ro.ri.  •  •  -  .ri-i.ri 
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, .  ,  ,  .  ,  T,  . . .  T-  i  Cwhere  :  >  O').  Containment  is  transitive.  Thus 

is  contained  in  the  class  i  -  1  region  ro.ri.  •  •  -.ri-i  (,v,nere  i  ^  u).  ^ 

region  1.2. 3. 4  is  contained  in  regions  1 .2.3,  1 .2  and  1. 


Given  any  pair  of  dommns,  we  classify  them  as  local,  remote  or  fax,  based  on  their  regions. 
Let  X  be  a  class  £  domain  and  Y  a  class  j  domain,  and  without  loss  of  generality  let  :  <  j.  X 
and  y  axe  local  if  they  axe  in  the  same  class  £  region.  For  example  in  Figure  11,  A  is  local  to 
B  C  J  K  M  N  0  P,  and  <2.  and  y  are  remote  if  they  are  not  in  the  same  class  t  re^on  but 
......  onn -- 0.  ro, ...... 

.4  U  imoi4  to  tie  D,E,  F.  tiid  L.  X  ^iY  tie  /cr  ii  they  ate  tot  local  oi  remote.  For  example 
in  Figure  11,  >4  is  far  to  J. 

We  refer  to  a  domain -level  edge  as  local  {remote,  or  far)  if  the  two 


domains  it  connects  are  local 


(remote,  or  far). 

We  use  the  following  procedure  to  generate  internetwork  topolopes: 

•  We  first  specify  the  number  of  domain  classes,  and  the  number  of  domains  in  each  class. 

•  We  next  specify  the  regions.  Note  that  the  number  of  region  classes  equals  the  number  of 
domain  classes.  We  specify  the  number  of  class  0  regions.  For  each  class  i  >  0,  we  specify  a 
branching  factor,  which  creates  that  many  class  i  regions  in  each  class  i—1  region.  (That  is, 
if  there  are  two  class  0  regions  and  the  class  1  branching  factor  equals  three,  then  there  are 
six  class  1  regions.) 

•  For  each  class  i,  we -randomly  map  the  class  i  domains  into  the  class  i  regions.  Note  that 
several  domains  can  be  mapped  to  the  same  region,  and  some  regions  may  have  no  domain 
mapped  into  them. 

•  For  every  class  i  and  every  class  j,  j  >  i,  we  spedfy  the  number  of  local,  remote  and  far 
edges  to  be  introduced  between  class  i  domains  and  class  j  domains.  The  end  points  of  the 
edges  are  chosen  randomly  (within  the  specified  constraints). 

•  We  ensure  that  the  internetwork  topology  is  connected  by  ensuring  that  the  subgraph  of  class 
0  domains  is  connected,  and  each  class  i  domain,  for  i  >  0,  is  connected  to  a  local  class  i  —  1 
domain. 

•  Each  domain  has  one  gateway.  So  all  neighbors  of  a  domain  are  connected  via  this  gateway. 
This  is  for  simplicity. 

Choosing  Policy/ToS  Constraints 

We  chose  a  simple  scheme  to  model  policy /ToS  constraints.  Each  domain  is  assigned  a  color:  green 
or  red.  For  each  domain  class,  we  specify  the  percentage  of  green  domains  in  that  dass,  and  then 
randomly  choose  a  color  for  each  domain  in  that  dass. 

A  valid  route  from  a  source  to  a  destination  is  one  that  does  not  visit  any  red  intermediate 
domains;  the  source  and  destination  domains  are  allowed  to  be  red. 

This  simple  scheme  can  model  many  realistic  policy/ToS  constraints,  such  as  security  constraints 
and  bandwidth  requirements.  It  cannot  model  some  important  kinds  of  constraints,  such  as  delay 
bounds. 
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Computing  Evaluation  Measures 

The  e\'aluation  measures  of  most  interest  for  an  inter-domain  routing  protocol  are  its  memory,  time 
and  communication  requirements.  We  postpone  the  precise  definitions  of  the  evaluation  measures 
to  the  next  subsection. 

The  only  analysis  method  we  have  at  present  is  to  numerically  compute  the  evaluation  measures 
for  a  variety  of  source-destination  pairs.  Because  we  use  internetwork  topologies  of  large  sizes,  it 
is  not  feasible  to  compute  for  all  possible  source- destination  pairs.  We  randomly  choose  a  set 
of  source-destination  pairs  that  satisfy  the  following  conditions:  (1)  the  source  and  destination 
domains  are  different  stub  domains,  and  (2)  there  exists  a  valid  path  from  the  source  domain  to  the 
destination  domain  in  the  internetwork  topology.  (Note  that  the  straight-forward  scheme  would 
always  find  such  a  path.) 

7.2  Application  to  Superdomain  Query  Protocol 

We  use  the  above  model  to  evaluate  our  superdomain  query  protocol  for  several  different  super¬ 
domain  hierarchies.  For  each  hierarchj',  we  define  a  set  of  superdomain-ids  and  a  parent-child 
relationship  on  them. 

The  first  superdomain  hierarchy  scheme  is  referred  to  as  child-domains.  Each  domain  d  (re¬ 
gardless  of  its  class)  is  a  level- 1  superdomain,  also  identified  as  d.  In  addition,  for  each  backbone  d, 
we  create  a  distinct  level-4  superdomain  referred  to  as  d-4.  For  each  regional  d,  we  create  a  distinct 
level-3  superdomain  d-Z  and  make  it  a  child  of  a  randomly  chosen  level-4  superdomain  e-4  such 
that  d  and  e  are  local  and  connected.  For  each  MAN  d,  we  create  a  distinct  level-2  superdomain 
d-2  and  maie  it  a  child  of  a  randomly  chosen  level-3  superdomain  e-3  such  that  d  and  e  are  local 
and  connected.  Please  see  Figure  12. 

We  next  describe  how  the  level-1  superdomains  (i,e.  the  domains)  are  placed  in  the  hierarchy. 
A  backbone  d  is  placed  in,  i.e.  as  a  child  of,  <i-4.  A  regional  d  is  placed  in  d-Z.  A  MAN  d  is  placed 
in  d-2.  A  stub  d  is  placed  in  c-2  such  that  d  and  e  are  local  and  connected.  Please  see  Figure  12. 

The  second  superdomain  hierarchy  scheme  is  referred  to  as  sibling-domains.  It  is  identical 
to  child-domains  except  for  the  placement  of  level-1  superdomains  corresponding  to  backbones, 
regionals  and  MANs.  In  sibling-domains,  a  backbone  d  is  placed  as  a  sibhng  of  d-4.  A  regional  d 
is  placed  as  a  sibling  of  d-Z.  A  MAN  d  is  placed  as  a  sibling  of  d-2.  Please  see  Figure  13. 
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In  leaf-domains,  backbones  and  regionals  axe  placed  in  some  level-2  superdomain,  cls  follows.  A 
regional  d,  if  superdomain  d-3  bas  a  child  superdomain  e-2,  is  placed  in  e-2.  Otherwise,  a  new  level- 
2  superdomain  d-2  is  created  and  placed  in  <i-3.  d  is  placed  in  d-2.  A  backbone  d,  if  superdomain 
d-4  has  a  child  superdomain  /-3,  is  placed  in  the  level-2  superdomain  containing  the  regional  /. 
Otherwise,  a  new  level-3  superdomain  d-3  is  created  and  placed  in  d-4,  a  new  level-2  superdomain 
d-2  is  created  and  placed  in  d-3.  d  is  placed  in  d-2.  Please  see  Figure  14. 

Note  that  in  leaf-domains,  all  level-l  superdomains  are  placed  under  level-2  superdomains. 
Whereas  other  schemes  allow  some  level-1  superdomains  to  be  placed  under  higher  level  superdo¬ 
mains. 


The  fourth  superdomain  hierarchy  scheme  is  referred  to  as  regions.  In  this  scheme,  the  super¬ 
domain  hierarchy  corresponds  exactly  to  the  region  hierarchy  used  to  generate  the  internetwork 
topolog>^  That  is,  for  a  class  1  region  x  there  is  a  distinct  level  5  (top  level)  superdomain  x-o.  For 
a  class  2  region  x.y  there  is  a  distinct  level  4  superdomain  x.y-4  placed  under  level  5  superdomain 
a:-5.  and  so  on.  Each  domain  is  placed  under  the  superdomain  of  its  region.  Please  see  Figure  11. 
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Results  for  Internetwork  1 


The  parameters  of  the  first  internetwork  topology,  referred  to  as  Internetwork  1,  axe  shown  in 


Table  1. 


Class  i 

i\o.  of  Domains 

l\o.  of  Regions^ ° 

%  of  Green  Domains 

Edges  b 

Class  j 

etween  C 

Local 

liasses  i  aj 

Remote 

id  j 

Far 

0 

10 

4 

0.80 

0 

8 

6 

0 

1 

lOO 

16 

0.75 

0 

190 

20 

1 

26 

5 

0 

2 

1000 

64 

0.70 

0 

100 

0 

0 

1 

1 

1060 

40 

0 

li 

2 

200 

40 

0 

3 

10000 

256 

0.20 

0 

100 

0 

0 

1 

ICC  ’  0 

0 

2 

10100 

50 

0 

3 

50 

50 

50 

Table  1:  Parameters  of  Internetwork  1. 


Onr  evaluation  measures  were  computed  for  a  (randomly  chosen  but  fixed)  set  of  100,000  source- 
destination  pairs.  For  a  source-destination  pair,  we  refer  to  the  length  of  the  shortest  valid  path  in 
the  internetwork  topologj-  as  the  shoriesi-path  length,  or  spl  in  short.  The  minimum  spl  of  these 
pairs  was  2,  the  maximum  spl  was  15,  and  the  average  spl  was  6.84. 

For  each  source- destination  pair,  the  set  of  candidate  paths  is  examined  in  shortest-first  order 
until  either  a  valid  path  was  found  or  the  set  was  exhausted  and  no  valid  paths  were  found. 
For  each  candidate  path,  RequestIViev  messages  are  sent  to  all  candidate  superdomains  on  this 
path  in  parallel.  All  ReplylVieu  messages  <re  rec«ved  in  time  proportional  to  the  round-trip 
time  to  the  farthest  of  these  superdomains.  Hence,  total  time  requirement  is  proportional  to  the 
number  of  candidate  paths  queried  multiplied  by  the  round-trip  time  to  the  farthest  superdomain 
in  these  paths.  Let  msgsize  denote  the  sum  of  average  RequestIViev  message  size  and  average 

Bran  dung  lactoi  is  4  for  all  region  classes. 
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Scheme 

child-domains 

No  query  needed 

220 

Candidate  Paths 

3.31/13 

Candidate  Superdomains 

7.35/38 

SlhllTlQ~^OTTLQH^S 

220  i 

3/10 

6.17/22 

leaf-domains 

219 

6.31/24 

15.94/66 

regions 

544 

3.70/12 

7.79/30 

Table  2:  Queries  for  InterEetwork  1. 

K.p,yIVie»  si.e.  The  nemher  of  candidate  seperdomaies  qeeried  times  mspsise  indicates 

IT  Imaeicatioe  capaeit,  te,*ed  to  ship  the  Ke,aestIVU.  a.d  KepTpIVie.  messages 
Table  2  lists  for  each  seperdomaia  scheme  the  average  aad  m^mum  namher  o  can  .  a  p 
and  candidate  snperdomains  queried.  As  apparent  from  the  table, 

other  schemes  and  leaf-domains  is  much  worse  than  the  rest,  i  ms  ,  ,  ,  rj 

,  onip  one  domain  d  in  a  superdomain  U  is  actualip  going  to  be  crossed.^  descend^fo 
containing  d  map  need  to  be  queried  to  obtain  a  valid  path  (e.g.  to  ..s  bachbone  F.g  , 
i,  may  be  necessary  to  query  for  superdomain  A-4,  then  B-3,  then  C-  ). 


Scheme 
child-domains 
sibling- domains 
leaf- domains 


regions 


Initial  view  size 

in  sd-gateways  |  in  superdomains 

964/1006  ^2/60 

1167/1269  j  70/9^ 

963/1006  1  40/^0 

492/715  85/163 


Merged 

in  sd-gateways 

view  size 

in  superdomains 

1089/1282 

100/298 

1470/2190 

148/337 

1108/1322 

130/411 

Table  3:  View  sizes  for  Internetwork  1. 

Table  3  lists  for  each  superdomain  scheme  the  average  and  maximum  of  the 
a.d  of  the  merged  view  site.  The  initial  view  sine  indicates  the  menm.  ™ 

without  using  the  query  protocol  (i.e.  assuming  the  tmttal  vtew  has  a  v  pa  ^ 

^  indicates  the  memory  requirement  at  a  router  during  the  quent  protocol  (after  „ 
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path).  The  memory  requirement  at  a  router  is  0(view  size  in  number  of  sd-gateways  x  £!q)  where 
Be  is  the  average  number  of  edges  of  an  sd-gateway.  Note  that  the  source  does  not  need  to  store 
information  about  red  and  non-transit  domains  in  the  merged  views  (other  than  the  ones  already 
in  the  initial  view).  The  numbers  for  the  merged  view  sizes  in  Table  3  take  advantage  of  this. 

As  apparent  from  the  table,  leaf-domains,  child-domains  and  regions  scale  better  than  sibling- 
domains.  There  are  two  reasons  for  this.  First,  placing  a  backbone  (regional  or  MAN)  domain  d  as  a 
sibling  to  d-4  (d-3  or  d-2)  doubles  the  number  of  level  4  (3  or  2)  superdomains  in  the  views  of  routers. 
Second,  since  these  domains  have  many  edges  to  the  domains  in  their  associated  superdomains,  the 
end  points  of  each  of  these  edges  become  sd-gateways  of  the  associated  superdomains.  Note  that 
regions  scales  much  superior  to  the  other  schemes  in  the  initial  view  size.  This  is  because  most 
edges  are  local  (i.e.  contained  within  regions),  thus  contained  completely  in  superdomains.  Hence, 
their  end  points  axe  not  sd-gateways. 

Overall,  the  child-domains  and  regions  schemes  scale  best  in  space,  time  and  communication 
requirements.  We  have  repeated  the  above  evaluations  for  two  other  internetworks  and  obtained 
similar  conclusions.  The  results  are  in  Appendix  A. 


8  Related  Work 

In  this  section,  we  survey  recently  proposed  inter-domain  routing  protocols support  ToS  and 
policy  routing  for  large  internetworks. 

Nimrod  (6]  and  IDPR  [16]  use  the  link-state  approach  with  domain-level  source  routing  to 
enforce  policy  and  ToS  constraints  and  superdomains  to  solve  scaling  problem.  Nimrod  is  stih  in 
a  design  stage.  Both  protocols  suffer  from  loss  of  policy  and  ToS  information  as  mentioned  in  the 
introduction.  A  query  protocol  for  Nimrod  is  being  developed  to  obtain  more  detailed  policy,  ToS 
and  topology  information. 

BGP  [12]  and  IDRP  [14]  axe  based  on  a  path-vector  approach  [15].  Here,  for  each  destination 
domain  a  router  maintains  a  set  of  paths,  one  through  each  of  its  ndghbor  routers.  ToS  and  policy 
information  is  attached  to  these  paths.  Each  router  requires  0{Nd  x  Nd  x  Er)  space,  where  Nd 
is  the  average  number  of  neighbor  domains  for  x.  domain  and  I^^r  is  the  number  of  routers  in  the 
internetwork.  For  each  destination,  a  router  exchanges  its  best  valid  path  with  its  neighbor  routers. 
However,  a  path- vector  algorithm  may  not  find  a  t’alid  path  from  a  source  to  the  destination  even 
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if  such  a  route  exists  [16]^^  (i.e.  detailed  ToS  and  policy  information  may  be  lost).  By  exchanging  k 
paths  to  each  destination,  the  probability  of  detecting  a  \'aJid  path  for  each  source  can  be  increased. 
But  to  guarantee  detection,  either  ah  possible  paths  should  be  exchanged  (exponential  number  of 
paths  in  the  worst  case)  or  source  policies  should  be  made  public  and  routers  should  take  this  into 
account  when  exchanpng  routes.  However,  this  fix  increases  space  and  communication  requirements 
drastically.  . 

IDRP  [14]  uses  superdomains  to  solve  the  scaling  problem.  It  exchanges  ah  paths  between 
neighbor  routers  subject  to  the  fohowing  constraint:  a  router  does  not  inform  a  neighbor  router 
of  a  route  if  usage  of  the  route  by  the  neighbor  would  violate  some  superdomain’s  constraint  on 
the  route.  IDRP  also  suffers  from  loss  of  ToS  and  policy  information.  To  overcome  this  problem, 
it  uses  overlapping  superdomains:  that  is,  a  domain  and  superdomain  can  be  in  more  than  one 
parent  superdomain.  If  a  valid  path  over  a  domain  can  not  be  discovered  because  the  constraints 
of  a  parent  superdomain  are  violated,  the  same  path  may  be  discovered  through  another  parent 
superdomain  whose  constraints  are  not  violated.  However,  handling  ToS  and  policy  constraints 
in  general  requires  more  and,  more  combinations  of  overlapping  superdomains,  resulting  in  more 
storage  requirement. 

Reference  [9]  combines  the  benefits  of  path-vector  approach  and  link-state  approach  by  having 
two  modes:  An  NR  mode,  w’hich  is  an  extension  of  IDRP  and  is  used  for  the  most  common  ToS 
and  policy  constraints;  and  a  SDR  mode,  which  is  like  IDPR  and  is  used  for  less  frequent  ToS  and 
policy  requests.  This  study  does  not  address  the  scalability  of  the  SDR  mode.  Ongoing  work  by 
this  group  considers  a  new'  SDR  mode  w’hich  is  not  based  on  IDPR. 

Reference  [19]  suggests  the  use  of  multiple  addresses  for  each  node,  one  for  each  ToS  and  Policy. 
This  scheme  does  not  scale  up.  In  fact,  it  increases  the  storage  requirement,  since  a  router  maintains 
a  route  for  each  destination  address,  and  there  are  more  addresses  with  this  scheme. 

The  landmark  hierarchy  [18,  17]  is  another  approach  for  solving  scaling  problem.  Here,  each 
router  is  a  landmark  with  a  radius,  and  routers  which  are  at  most  radius  awayfrom  the  landmark 
maintain  a  route  for  it.  Landmarks  are  organized  hierarchically,  such  that  radius  of  a  landmark 
increases  with  its  level,  and  the  radii  of  top  level  landmarks  include  all  routers.  Addressing  and 

Foi  ex2anpk,  suppose  a  router  u  has  two  paths  Pi  and  P2  to  the  destination.  Let  «  have  a  router  neighbor  v, 
which  is  in  another  domain,  u  chooses  and  iniorms  v  of  one  of  the  paths,  say  Pi.  But  Pi  may  violate  source  policies 
of  f’s  domain,  and  P2  may  be  a  valid  path  for  v. 
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packet  forwarding  schemes  are  introduced.  Link-state  algorithms  can  not  be  used  with  the  landmark 
hierarchy,  and  a  thorough  study  of  enforcing  ToS  and  policy  constraints  with  this  hierarchy  has 
not  been  done. 

In  [l],  we  provided  an  alternative  solution  to  loss  of  policy  and  ToS  information  that  is  perhaps 
more  faithful  to  the  original  superdomain  hierarchy.  To  handle  superdomain-level  source  routin? 
and  topology  changes,  we  augmented  each  superdomain-level  edge  {U,V)  with  the  address  of  an 
“exit"  domain  u  in  U  and  an  “entry”  domain  v  in  V,  To  obtain  internal  views,  we  added  for 
each  visible  superdomain  U  the  edges  from  U  to  domains  outside  the  parent  of  U.  Surprisingly, 
this  approach  and  the  gateway-level  view  approach  have  the  same  memory  and  communication 
requirements.  However,  the  first  approach  results  in  much  more  complicated  protocols. 

Reference  [2]  presents  interdomain  routing  protocols  based  on  a  new  kind  of  hierarchy,  referred 
to  as  the  view  server  hierarch}^  This  approach  also  scales  well  to  large  internetworks  and  does 
not  lose  detail  ToS  and  policy  information.  Here,  special  routers  called  viewservers  maintain  the 
view  of  domains  in  a  surrounding  precinct.  Viewservers  are  organized  hierarchically  such  that 
for  each  viewserver,  there  is  a  domain  of  a  lower  level  viewserver  in  its  view,  and  views  of  top 
level  viewservers  include  domains  of  other  top  level  viewservers.  Appropriate  addressing  and  route 
discovery  schemes  are  introduced. 

9  Conclusion 

We  presented  a  hierarchical  inter-domain  routing  protocol  which  satisfies  policy  and  ToS  con¬ 
straints,  adapts  to  dynamic  topologj^  changes  including  failures  that  partition  domains,  and  scales 
well  to  large  number  of  domains. 

Our  protocol  achieves  scaling  in  space  requirement  by  using  superdomains.  Our  protocol  main¬ 
tains  superdomain-level  views  with  sd-gateways  and  handles  topology  changes  by  using  a  link-state 
view  update  protocol.  It  achieves  scaling  in  communication  requirement  by  flooding  topology 
changes  affecting  a  superdomain  V  over  l/’s  parent  superdomain. 

Our  protocol  does  not  lose  detail  in  ToS,  policy  and  topology  information.  It  stores  both  a 
strong  set  of  constraints  and  a  weak  set  of  constraints  for  each  visible  superdomain.  If  the  weak 
constraints  but  not  the  strong  constraints  of  a  superdomain  U  axe  satisfied  (i.e.  the  aggregation  has 
resulted  in  loss  of  detail  in  ToS  and  policy  information),  then  some  paths  through  JJ  may  be  valid. 
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Our  proiocol  uses  a  query  protocol  to  obtain  a  more  detailed  “internal”  view  of  such  superdomains, 
and  searches  again  for  a  valid  path.  Our  evaluation  results  indicate  that  the  query  protocol  can  be 
performed  using  15%  extra  space. 

One  drawback  of  our  protocols  is  that  to  obtain  a  source  route,  views  are  merged  at  or  prior 
to  the  connection  setup,  thereby  increasing  the  setup  time.  This  drawback  is  not  unique  to  our 
scheme  [7,  16,  6,  9].  There  are  several  ways  to  reduce  this  setup  overhead.  First,  source  routes 
to  frequently  used  destinations  can  be  cached.  Second,  the  internal  views  of  frequently  queried 
superdomains  can  be  cached  at  routers  close  to  the  source  domain.  Third,  better  heuristics  to 
choose  candidate  paths  and  candidate  superdomains  to  query  can  be  developed. 

We  also  described  an  evaluation  model  for  inter-domain  routing  protocols.  This  model  can  be 
applied  to  other  inter-domain  routing  protocols.  We  have  not  done  so  because  precise  definitions  of 
the  hierarchies  in  these  protocols  are  not  available.  For  excimple,  to  do  a  fair  evaluation  of  IDPR[16], 
we  need  precise  guidelines  for  how  to  group  domains  into  superdomains,  and  how  to  choose  between 
the  strong  and  weak  methods  when  defining  policy /ToS  constraints  of  superdomains.  In  fact,  these 
protocols  have  not  been  evaluated  in  a  way  that  we  can  compare  them  to  the  superdomain  hierarchy. 
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A  Results  for  Other  Internetworks 

Results  for  Internetwork  2 

The  parameters  of  the  second  internetwork  topology,  referred  to  as  Internetwork  2,  are  the  same  as 
the  parameters  of  Internetwork  1  but  a  different  seed  is  used  for  the  random  number  generation. 

Our  evaluation  measures  were  computed  for  a  set  of  100,000  source-destination  pairs.  The 
minimum  spl  of  these  pairs  was  1,  the  maximum  spl  was  14,  and  the  average  spl  was  7.13. 

Table  5  and  Table  4  shows  the  results.  Similar  conclusions  as  in  the  case  of  Internetwork  1  hold. 


Results  for  Internetwork  3 

The  parameters  of  the  third  internetwork  topology,  referred  to  as  Internetwork  3,  are  shown  in 
Table  6.  Internetwork  3  is  more  connected,  more  class  0,  1  and  2  domains  are  green,  and  more 
class  3  domains  are  red.  Hence,  we  expect  bigger  view  sizes  in  number  of  sd-gateways. 
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No  q^ery  needed 

Candidate  Paths 

Candidate  Superdomains 

1  — — -  — ■ 

child-domains 

205 

10.22/47 

sibling- domains 

205 

3.01/8 

6.50/21 

leaf^domains 

205 

8.80/32 

21.34/82 

regions 

640 

3.52/10 

7.85/28 

Table  4:  Queries  for  Internetwork  2. 


Scheme 

Initial  view  size 

in  sd-gateways  |  in  superdomains 

Merged 

in  sd-gateways 

view  size 

in  superdomains 

child-domains 

958/1012 

43/60 

1079/1269 

118/306 

..... 

sibling- domains 

1153/1283 

72/101 

1480/2169 

160/324 

leaf-domains 

956/1009 

41/58 

1095/1281 

156/387 

regions 

624/1024 

110/231 

1356/3578 

206/435 

Table  5;  View  sizes  for  Internetwork  2. 

Our  evaluation  measures  were  computed  for  a  set  of  100,000  source-destination  pairs.  The 
minimum  spl  of  these  pairs  was  1,  the  maximum  spl  was  11,  and  the  average  spl  was  5.95. 

Table  8  and  Table  7  shows  the  results.  Similar  conclusions  as  in  the  cases  of  Internetwork  1 

and  2  hold. 


Branching  factor  is  4  for  ail  domain  classes. 
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Class  i 

No.  of  Domains 

No.  ofRegions^^ 

%  of  Green  Domains 

Edges  b 

Class  j 

etween  ( 

Local 

[glasses  i  ai 

Remote 

nd  j 

Far 

0 

10 

4 

0.85 

0 

8 

7  . 

0 

1 

100 

16 

0.80 

0 

190 

20 

0 

1^1 

■■■1 

1 

50 

20 

0 

2 

1000 

64 

0.75 

0 

500 

50 

0 

1 

100 

m 

2 

200 

40 

0 

3 

10000 

256 

0.10 

0 

300 

50 

0 

HI 

1 

250 

100 

0 

IHH 

2 

10250 

150 

50 

HIHI 

3 

200 

150 

100 

Table  6:  Parameters  of  Internetwork  3. 


Scheme 

No  query  needed 

Candidate  Paths 

Candidate  Superdomains 

child- domains 

142 

3.99/29 

7.70 f4Z 

sibling-domains 

142 

2.95/10 

5.39/22 

leaf-domains 

142 

9.65/70 

18.99/103 

regions 

676 

3.47/17 

6.25/21 

Table  7:  Queries  for  Internetwork  3. 
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Initial 

view  size 

Merged  view  size 

Scheme 

in  sd-gateways 

in  superdomains 

in  sd-gateways 

in  snperdomains 

child- domains 

2160/2239 

43/60 

2354/2647 

107/348 

sibling-domains 

2365 /2504 

72/101 

2606/3314 

148/356 

leaf-domains 

2159/2236 

41/58 

2386/2645 

160/648 

regions 

1107/1644 

110/231 

1850/3559 

194/436 

Table  8:  View  sizes  for  Internetwork  3. 
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Variables: 

Viewx  •  Dynamic  view  of  i. 

WVicwx:(d^ddres$).  Temporary  view  of  x.  d^ddress  is  the  destination  address. 

Used  for  merging  internal  views  of  superdomains  to  the  view  of  x. 

P ending Iieqx:{dMddrcs$).  Integer,  djaddress  is  the  destination  address. 

Number  of  outstanding  request  messages. 

Events: 

Requesix{d^addrcss)  {Executed  when  x  wants  a  valid  domain-level  source  route} 
allocate  WVi€Wx{dMddress)  :=  View^;  allocate  P ending Reqxidjaddr ess)  :=  0; 
sear  ckx:{d^addr  ess)] 

where 

search;:  (djaddress) 

if  there  is  a  valid  path  to  djaddress  in  WViewx{d^ddress)  then 
resu/t  :=  shortest  valid  path; 

deallocate  WViewx(djiddre$$)^  PendingReqx{dMddress)] 
return  result] 

else  if  there  is  a  candidate  path  to  djaddress  in  WViewj  (djaddress)  then 
Let  cpaih  =  (Uoigo^, , . ,  ,Uo:9o^^,Ui:gi^, . . ,  ,  ••• 

be  the  shortest  candidate  path; 
for  Ui  in  cpaih  such  that  Ui  is  candidate  do 

Rth^h\eStI^d(KeqnestlViel:,Uii  gi^,  address{x),  djaddress)  to  gi^ 
PendingReqx{d.address)  :=  Pending  Reqxidjaddr  ess)  4- 1; 

else 

deallocate  WViewx(djaddress)^  Pending  Reqxidjaddr  ess)] 
return  failure; 
endif 
endif 

TimeOuixidjaddress)  {Executed  after  a  time-out  period  and  P ending Reqxidjiddr ess)  ^  0.) 
deallocate  WVieWx  (djaddress).  Pending  Reqxidjaddr  ess)] 
return  failure; 


Figure  15:  view-query  protocol:  State  and  events  of  a  router  x.  (Figure  continued  on  next  page.) 
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Receive^  (ReqiiestlViey,  sdid,  x,  sjiddress,  d^addrtss) 

ReliableSend(ReplyIViex?,sd2C?,x,7y:cui:(t/),d-ficldres5)  to  s^addrc$s\ 

Receive^  (Reply  IVieir,  sdid,  gid,  iview,  djiddress) 

if  PendingRtqridMdrcss)  :p  0  then  {No  time-out  happened} 

P ending Reqx{d-addr ess)  :=  Pending Req^id^addr ess)  -  1; 

{merge  internal  view) 
delete  (sdid, *,*,*)  from  WViewz] 
for  {child,  scons,  wcons,  gaieway-sei)  in  iview  do 

ii  ^3 {child,  E  WV iewz 

insert  {child,  scons,  wcons,  gaieway-sei)  in  WVieWzl 

else 

for  {gid,  is,  edge-sci)  in  gaUway-sei  do 

if  ^{gid,  iifnestoiTnp,  €  GQ’ie'wcys^Edges^{child)  A  is  ^  iiTnesioTTip  then 
delete  {gid,  *,  *)  from  Gaicways&Edges^{child); 
endif; 

if  ^3{gid,  *)  €  Gateways &Edges^{child)  then 

insert  {gid,  is,  edge-sti)  to  GaUway$&Edges^{childy, 
endif 

endif 

if  PendingReqzidjxddress)  =  0  then  {All  pending  replies  are  received) 

sear  chz{d.addT  ess); 
endif 

endif _ _ _  _ 

Fjc^uie  15:  view-query  protocol:  State  and  events  of  a  router  z,  (cent.) 
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Constants: 


AdjLocalRouters^.  (C  Kodelds).  Set  of  neighbor  routers  in  x’s  domain. 

AdjForeignGatewaySy-  (C  Nodelds).  Set  of  neighbor  routers  in  other  domains. 

Ancestor,- (2).  (C  SuperDomainIds).  ith  ancestor  of  z. 

Variables: 

View^.^  Dynamic  view  of  z. 

IntraDomaiTiRT:^ ,  Intra-domain  routing  table  of  z.  Initially  contains  no  entries. 

Clocks:  :  Integer.  Clock  of  z. 

Events; 

iJeceive- (Update,  sdid,  gid,  is,  edge-sei)  from  sender 

if3(gid,  timestamp,  ♦)  £  Gaieways&Edges^{sdid)  A  is  >  timestamp  then 
delete  {gid,  *)  from  Gateway$&Edges^(sdidy, 
endif; 

if  -^3{gid,  *,  £  Gateways&Edges^(sdid)  then 

//oodr ((Update,  sdid,  gid,  is,  edge^sci)); 
insert  {gid,  is,  edge-sei)  to  Gaieways&Edgcs^(sdid)\ 
updaie^paTeni^domainSs{level(sdid)  -I- 1); 
endif 

where 

updQte-.parenl.tio77ia2ns3:(startinp/eve/) 

for  level  :=  siariinglevel  to  number  of  levels  in  the  hierarchy  do 
sdid  :=  Ancestor/cvcK^)^ 
if  z  €  Gatei?ays(sdid)  then 

edgt’Sti  aggregate  edges  of  sdid:z  using  Viev}s,lniraDomainRTs  and  links  of  z; 
timestamp  =  Clocks] 

//oodx ((Update,  sdid,  z,  timestamp,  edge-sei))\ 
delete  (z,  *,  *)  from  Gateway s&Edges^{sdidy, 
insert  (z,  timestamp,  edge-set)  to  Gateway s&EdgeSs{sdidy, 
endif 

DoJJpdaiC:  {Executed  periodically  and  upon  a  change  in  IniraDomainRTs  or  links  of  z} 

update4>aTeni^domainSs{l) 

LinkJiecoverys{y)  is  a  link.  Executed  when  {x,y)  recovers.} 

for  all  {sdid,  »,  *}  in  View^  do 

if  3i  :  Anccstor,*(y)  =  Ancestori(sdxd)  then 

for  all  {gid,  timestamp,  edge-set)  in  Gaieways&EdgeSs{$did)  do 
Send((Update,  sdid,  gid,  timestamp,  edge-set))  to  y; 

endif 

floods  (packet) 

for  all  y  £  AdjLocalKouters.  do 
Send{packet)  to  y; 

for  all  y  £  Adj  For  eignG  ate  ways.  A  3i  :  Ancestor, -(y)  =  Ancestor)  (pacikei. sdid)  do 
Send(pQciet)  to  y; 

Figure  16:  view-update  protocol:  State  and  events  of  a  router  z. 
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Optimization  in  Non-Preemptive  Scheduling  for 
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Abstract 

Real-time  computer  systems  have  become  more  and  more  important  in  many  ap¬ 
plications,  such  as  robot  control,  Sight  control,  and  other  mission-critical  jobs.  The 
correctness  of  the  system  depends  on  the  temporal  correctness  as  well  as  the  functional 
correctness  of  the  tasks.  We  propose  a  scheduling  algorithm  based  on  an  analytic 
model.  Our  goal  is  to  derive  the  optimal  schedule  for  a  given  set  of  aperiodic  tasks 
such  that  the  number  of  rejected  tasks  is  minimized,  and  then  the  finish  time  of  the 
schedule  is  also  minimized.  The  scheduling  problem  with  a  nonpreemptive  discipline 
in  a  uniprocessor  system  is  considered.  We  first  show  that  if  a  total  ordering  is  given, 
this  can  be  done  in  0{n?)  time  by  d3Tiamrc  programming  technique,  where  n  is  the 
size  of  the  task  set.  When  the  restriction  of  the  total  ordering  is  released,  it  is  known 

•This  work  is  supported  in  part  by  Honeywell  under  N00014-91-C-0195  and  Army /Phillips  under  DASG- 
60-92-C-0055.  The  views,  opinions,  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and 
should  not  be  interpreted  as  representing  the  ofncial  policies,  either  expressed  or  implied,  of  Honeywell  or 
Army/Phillips. 
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to  be  NP-complete  [3].  We  discuss  the  super  sequence  [18]  which  has  been  showTQ  to 
be  useful  in  reducing  the  search  space  for  testing  the  feasibility  of  a  task  set.  By  ex¬ 
tending  the  idea  and  introducing  the  concept  of  conformation,  the  scheduling  process 
can  be  divided  into  two  phases:  computing  the  pruned  search  space,  and  computing 
the  optimal  schedule  for  each  sequence  in  the  search  space.  While  the  complexity  of 
the  algorithm  in  the  worst  case  remains  exponential,  our  simulation  results  show  that 
the  cost  is  reasonable  for  the  average  case. 
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1  Introduction 


In  a  hard  real-time  system,  the  computer  is  required  to  support  the  execution  of  applications 
in  which  the  timing  constraints  of  the  tasks  axe  specified.  The  correctness  of  the  system 
depends  on  the  temporal  correctness  as  well  as  the  functional  correctness  of  the  tasks.  Failure 
to  satisfy  the  timing  constraints  can  incur  fatal  errors.  Once  a  task  is  accepted  by  the  system, 
the  system  should  be  able  to  finish  it  under  the  timing  constraint  of  the  task.  A  task  T,-  can’ 
e  characterized  as  a  triple  of  (r,-,  c,-,  d,),  representing  the  ready  time,  the  computation  time, 
and  the  deadline  of  the  task,  respectively.  A  task  can  not  be  started  before  its  readv  time’ 
Once  started,  the  task  must  use  the  processor  for  a  consecutive  period  of  c.-,  and  be  finished 
y  Its  deadline.  The  task  set  is  represented  as  F  =  {ra,r2,...,r„}.  A  task  set  is  ftasihlt  if 
there  exists  a  sckeduk  in  which  all  the  tasks  in  the  task  set  can  meet  their  timing  constraints. 

cheduling  is  a  process  of  binding  starting  times  to  the  tasks  such  that  each  task  executes 
according  to  the  schedule.  A  sequence  S  =  .,T/),  where  k  <  n.  Tf  represents 

the  zth  task  of  the  sequence  5  for  any  1  <  i  <  A  sequence  specifies  the  order  in  which  the 
tasks  are  executed.  Without  confusion,  a  schedule  can  be  represented  as  a  sequence.  How 
to  schedule  the  tasks  so  that  the  timing  constraints  are  met  is  nontrivial.  Many  scheduling 
problems  are  known  to  be  intractable  (3]  in  that  finding  the  optimal  schedule  requires  large 
amounts  of  computations  to  be  carried  out. 

The  approaches  adopted  to  date  for  scheduling  algorithms  can  be  generally  classified 
into  two  categories.  One  approach  is  to  assign  priorities  to  tasks  so  that  the  tasks  can  be 
scheduled  according  to  their  priorities  [l,  7,  8, 10, 12, 15, 14].  This  approach  is  called  pHority 
based  scheduling.^  The  priority  can  be  determined  by  deadline,  execution  time,  resource 
requirement,  laxity,  period,  or  can  be  programmer-denned  [4].  The  other  is  time  based 
scheduling  approach  (9,  13].  A  time  based  scheduler  generates  as  an  output  a  cc/endar  which 
specifies  the  time  instants  at  which  the  tasks  start  and  fin^h 

Generally  speaking,  scheduling  for  aperiodic  task  sets  without  preemption  is  NP-complete 

!3].  Due  to  the  intractability,  several  search  algorithms  [11,  17, 19,  20]  are  proposed  for  com¬ 
puting  optimal  or  suboptimal  schedules.  Analytic  techniques  may  also  be  used  for  optimal 
scheduHng.  A  dominance  concept  by  Erschler  et  al  [2]  was  proposed  to  reduce  the  search 
space  for  checking  the  feasibility  of  task  sets.  They  explored  the  relations  among  the  tasks 
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and  determined  the  partial  orderings  of  feasible  schedules.  Yuan  and  Agrawala  [IS]  proposed 
decomposition  methods  to  substantially  reduce  the  search  space  based  on  the  dominance 
concept.  A  task  set  is  decomposed  into  subsets  so  that  each  subset  can  be  scheduled  in¬ 
dependently.  A  super  sequence  is  constructed  to  reduce  search  space  further.  Saksena  and 
Agrawala  [13]  investigated  the  technique  of  temporal  analysis  serving  as  a  pre-processing 
stage  for  scheduling.  The  idea  is  to  modify  the  windows  of  two  partially  ordered  tasks 
which  axe  generated  by  the  temporal  relations  so  that  more  partial  orderings  of  tasks  may 
be  generated  recursively. 

The  time  based  model  is  employed  by  several  real-time  operating  systems  currently  being 
developed,  including  MARUTI  [5],  MARS  [6],  and  Spring  [16].  In  this  paper,  we  study  an 
analytic  approach  to  optimal  scheduling  imder  the  time  based  model.  When  complicated 
timing  constraints  and  task  interdependency  are  taken  into  consideration,  the  schedulability 
analvsis  of  priority  based  scheduling  algorithms  becomes  much  more  difficult.  By  analytic 
approach,  we  believe  that  the  time  based  scheduling  algorithm  and  analysis  require  reason¬ 
able  amounts  of  computations  to  produce  a  feasible  schedule. 

The  rest  of  this  paper  is  organized  as  follows.  In  section  2,  we  descnbe  how  to  compute 
the  optimal  schedule  for  a  sequence.  In  section  3,  releasing  the  restriction  of  total  ordering 
on  a  sequence,  we  present  the  approach  to  computing  the  optimal  schedule  for  a  task  set. 
Related  theorems  are  also  presented.  In  section  4,  a  simulation  experiment  is  conducted  to 
compare  the  performance  of  difierent  algorithms.  The  last  section  is  our  conclusions. 

2  Scheduling  a  Sequence 

The  size  of  a  sequence  (task  set)  is  the  number  of  tasks  in  the  sequence(task  set),  and  is 
denoted  by  \S\  ([?[).  A  sequence  5  is  feasible  if  all  tasks  in  S  axe  executed  in  the  order  of 
the  sequence  and  the  timing  constraints  are  satisfied.  For  convenience,  we  further  define  an 
instance,  I,  to  be  a  sequence  such  that  |J|  =  |r|.  We  denote  the  instance  I  by 

Notice  that  {}  is  used  to  represent  a  task  set,  and  ()  a  sequence.  Let  T;  and  T,-  be  two 
tasks  belonging  to  sequence  S.  H  Ti  is  located  before  T:  in  the  sequence  S,  we  say  that 
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Figure  1:  An  instance  1  =  (J/,  T^,  •  •  • , 

{T{,Tj)  conforms  to  S.  A  sequence  Si  conforms  to  a  sequence  52,  if,  for  any  T,  and  Tj, 
{Ti^Tj)  conforming  to  51  implies  {T{,Tj)  conforming  to  52.  We  use  a{k)  to  represent  the 
optimal  schedule  of  (T/,  T/ ?  •  •  sense  that  for  any  feasible  sequence  5  conforming 

to  (T/, TI,  ..  .,TI),  either 

I-?!  <  kWI, 

or 

151  =  |a(i)|  and  fs  >  f^{k),  (1) 

■R’here  fs  and  fc^k)  is  the  finish  time  of  5  and  (7{k)  respectively.  a(A)  is  thus  the  optimal 
schedule  for  the  first  k  tasks  of  I.  The  optimal  schedule  for  an  instance  I  can  thus  be 
represented  by  o'(n).  For  simphcity,  let  u*  =  la(i)l.  In  this  section,  we  wiD  discuss  the 
scheduling  for  an  instance.  However,  the  approach  is  generally  applicable  to  any  sequence. 

2.1  Preliminary 

We  assume  that  r,-  +  c,-  <  d;  holds  for  each  task  Ti  in  the  task  set  F.  At  the  first  glance,  one 
may  attempt  to  compute  cr{k)  based  on  a{k  —  1).  However,  with  careful  examination,  we 
can  find  that  merely  computing  cr{k  —  1)  does  not  suffice  to  compute  (^{k).  This  is  illustrated 
by  the  example  in  Figure  1.  From  this  example,  we  can  obtain 

a(i)  =  (r/) 


217 


a{2]  =  {T(X)- 

At  the  next  step,  a{2)  ©  (T/)  is  not  feasible,  where  the  operator  ©  means  concatenation 
of  two  sequences.  One  task  must  be  rejected,  which  is  T/  in  this  case.  Hence,  we  got 

a(3)  =  £’■(2)=  {TitTj). 

A  problem  arises  at  the  next  step.  £r(3)  ©  (T/)  is  not  feasible  either.  If  we  try  to  foe  it 
by  taking  a  task  off  this  sequence,  the  result  is 

o"^(4)  =  o‘(3)  =  (r/ ,T2). 

However,  the  correct  result  should  be 

a(4)  =  (r/,ri). 

Althongh  both  c'ii)  and  <t(4)  are  of  tha  same  siaa,  tha  latter  comer  with  a  shorter  finish 
time,  which  becomes  signincant  at  next  step.  We  get 

c(n)  =  £r(o)  =  0(4)  e  {Ti)  =  {Ti,Ti,Ti)- 

However,  v,nth  o-'(4),  we  would  have 

c’{b)  =  c\4)  =  (T‘,T’). 

This  example  shows  that  merely  computing  ir{k  -  1)  does  not  suffice  to  compute  o(h). 
When  u(k-\)  is  obtained,  it  can  not  be  predicted  as  to  which  tasks  would  be  included  m 

cT{k).  The  approach  has  to  be  modified  as  follows. 
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2.2  Sequence-Scheduler  Algorithm 

We  denote  by  the  sequence  such  that  S{kJ)  conforms  to  . .  ,7/)  and  \S{kJ)\  = 

j,  where  j  <  |<r(i)l.  S(k,j)  represents  any  sequence  of  j  tasks  picked  up  from  the  first  k 
tasks  of  S.  We  further  define  a  sequence,  denoted  b}’  cr(/:,  j),  to  be  the  optimal  schedule  with 
degree  j  for  (T/ ,  T/, . . . ,  T/)  in  the  sense  that  for  any  feasible  sequence  S{k,j),  we  have 

L{kj)  <  fs(k,j)- 

Notice  that  c^k)  is  an  abbreviation  of  cr(t,  u*).  If  a  sequence  S{k,j)  is  not  feasible, 

fs{kj)  =  CO. 

We  would  Hke  to  compute  cr{kj)  based  on  a{k  -  1,/),  where  j'  <j<  |o-(lb)|.  The  basic 
idea  is  as  follows.  We  know  a{k^j')  either  contains  Tl  or  not.  If  so,  then  the  other  j  —  1 
tasks  are  picked  up  from  the  first  k  —  1  tasks,  and  a[k  —  1,  j  —  1)  is  one  of  the  best  choices. 
In  this  case,  o'(k,j)  =  ^{k  —  1,  j  —  1)  ©  T/.  If  cr[k,j)  does  not  contain  T/,  aU  of  the  j  tasks 
should  be  picked  up  from  the  first  k-1  tasks,  and  a{k  -  IJ)  is  one  of  the  best  choices.  In 
this  case,  (r{k,j)  =  a{k  —  l,i).  Whether  taking  T/  or  not  is  determined  by  comparing  which 
one  of  the  sequences  comes  with  a  shorter  finish  time.  Therefore,  o'{k,j)  can  be  determined 
by  a{k  —  l,j  —  1),  and  a{k  —  l.j).  The  computation  of  a{k  —  l,j)  is  in  turn  based  on 
—  1),  and  cT(k  —  2,j).  In  general,  at  each  step  i,  we  need  to  compute  cr{k,j)  for 
j  ~  . . . ,  (o‘(^:)|.  The  algorithm  Sequence- Scdieduler  in  Figure  2  formalizes  this  idea.  It  is 

worth  mentioning  that  the  condition  of  the  '  while”  statement  in  the  algorithm  is  designed 
to  let  J  increase  from  1  through  |a’(fc)).  The  correctness  is  verified  in  the  next  section. 

2.3  Verification  of  Sequence-Scheduler  Algorithm 

The  proof  of  the  correctness  of  the  eJgorithm  along  with  some  related  lemmas  are  given 
below. 

Lemma  1  Let  51  and  52  be  two  sequences  such  that  jsi  <  fsi-  If  52  ©  (Tx)  is  feasible, 
then  Ss\B{T^)  <  fs7e{TT)- 
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Algorithm  Sequence-Scheduler: 

Input:  an  instance  1  =  {Ty  ,  Tj ,  •  •  •  5 

Output;  the  optimal  schedule  a(n)  =  a(n,Un)  for 


a(0,0)  :=  0;  110  =  0 

for  1:  ;=  1,2, . 


ink’ U  <  «.,)  or  (0  =  + 1)  “ 

if  Jc(k-xj-i)e{n)  < 

cr(fc,i):=  cr(k-l,j-l)e  (Ti) 

else 

(7(k,i)  := 

endif 

i  :=  j  -r  1 

endwhile 
Ufc  :=  j  -  1 
endfor 


Figure  2;  Sequence-Scheduler  Algorithm 


feasible)) 
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Proof:  This  is  straightforward  via  the  following  equations. 


fsie{Ts)  =  +  cj-, 

<  mai(/52,rxj  +  cr, 
=  /52e{T,} 


□ 

Corollary  1  Let  51  and  52  be  two  sequences  such  that  fsj  <  fs2-  If  52  ©  53  is  fe<isible, 
w'here  53  is  another  sequence,  then  fsi^sz  <  /s2®53- 

Proof:  This  is  a  direct  result  of  applying  Lemma  1  repeatedly  through  the  tasks  in  53.  O 
Lemma  2  uk  =  Ui-i  or  u*  =  Ui_i  +  1. 

Proof:  It  is  obvious  that  Uk  >  Ui-i,  where  both  u*  and  ujt-i  axe  integers.  Let  us  assrune 
that  Uk  =  Uk-i  +Q:,  and  a  >  2.  We  axe  going  to  show  that  this  assumption  does  not  hold.  We 
know  a{k.Uk)  either  contains  T/  or  not.  If  a{k,Uk)  contains  T/,  we  can  represent  cr{k,Uk) 
as  S{k  —  l,ui  -  1)  ©  (T/),  by  picking  up  a  proper  sequence  S{k  —  l,u*  —  1).  However, 
from  the  assumption  above,  we  have  =Uk  —  Q<Uk  —  1  =  |5(/t  — —  1)].  This 
contradicts  the  definition  of  Uk-y.  On  the  other  hand,  if  cT{k^Uk)  does  not  contain  T/,  we 
can  represent  c-{k,uk)  as  S{k  —  1,  Uk).  We  have  Uk--i  =  Uk  —  Q  <  Uk,  which  is  a  contradiction. 
The  assumption  thus  does  not  hold.  Therefore,  we  have  o  <  1.  D 

From  this  lemma,  does  exist  for  j  <  Uk-i-  Furthermore,  in  the  algorithm,  j  = 

u*_i  -f  1  is  tested  to  see  if  u*  =  Uk-i  -r  1. 

Theorem  1  For  k  =  1,2, ...,n,  and  j  =  1,2, if  /<T(i-ij-i)e(r/)  <  then 

—  IJ  -  1)  ©  (T/);  otherwise,  a-{kj)  =  a-{k  -  l,j). 

Proof:  The  proof  is  by  induction  on  k.  ^^^en  i  =  1,  Jj  =  (T/).  Since  ui  <  1  and  (T/)  is 
feasible,  a(l,  1)  =  {T^).  It  is  easy  to  come  up  with  the  same  result  through  this  theorem. 
Thus  holds  the  base  case.  We  assume  that  we  can  compute  ©-(i—  1, j),  for  j  =  1,2,...,  u*_i. 
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in  the  same  way,  and  consider  the  case  of  k.  Let  us  first  bring  forward  three  basic  equations. 
Since  Uk-ij-i)  <  ioliowing  equation  holds  by  Lemma  1 

(2) 

By  induction  hypothesis  on  a{k  -  1,  j)  such  that  <  fs{k-i.j),  we  have 

From  Equation  2,  we  have 

if  /a(Jc-ij)  <  Uk-ij-y)e{Tl)^  ^  fs(k-ij-i)e{Tly 

From  Lemma  2,  we  know  either  Uk=  ufc-i+1  ox  Uk=  ut-j.  The  two  cases  are  discussed 

below. 

Case  I:  tifc=  We  first  discuss  the  situation  when  j  =  1,2,...,  til -I-  We  know 

that  a  feasible  sequence  S{kJ)  is  either  in  the  form  of  S{k  -  1,  j)  or  S{k  - 1,  j  -  1)  ffi  (T, ). 

f  Jx  <  fcfh  1  ■  by  Equation  2.  This  means  fc{k-\j-i)e{Tl)  ^  /5(fc.i)  for  any 

/c(;t-aj-i)e{ri)  ^  ^  -  1  7  -  H  ©  (Tl),  which  justifies  the 

feasible  sequence  Consequently,  a(i,j)  - 

theorem.  On  the  other  hand,  if  /,r(*-ij-i)e{r/)  ^  fc{k-ij),  then  fc{k-xj)  < 

by  Equation  4.  In  addition,  <  /s(*-i J)  by  induction  hypothesis  on  a{k  -  1, j).  o 

jc{k-ij)  ^  fs(,kj)  ^OT  any  feasible  sequences  S(fc,i)-  In  this  case,  crik.j)  —  (^{k  1, j),  w 

justifies  the  theorem.  .  .  1  *v  *  /t-Jx 

Then  we  discuss  the  situation  when  j  =  Uk-  Since  ufc=  «fc-i+l,  it  is  clear  that  (  ,,  ) 

belongs  to  £r(fe);  otherwise,  we  need  to  pick  up  ufc_i+l  tasks  from  A-i  to  make  a  feasi  e 

sequence,  which  violates  the  definition  of  u*., .  Therefore,  a(fc,i)  can  be  expressed  as  S{k  1, 

ni-i)  ©  {Tl)  by  picking  up  a  proper  sequence  S{k-1,  Uk-y).  Note  that  Uk-y-  j  -  ^  ^ 

Eauation  2,  we  have  /cr{;b~ij-i)e(rjJ)  —  f s{k^ij-y)B{Tl)y  sequence 

Thus,  cikj)  =  a{k  -  l,i  -  1)  ©  (Ti).  Now  let  us  check  the  theorem.  The  sequence 
ijlk  —  I  j)  =  crfit  —  1  Ufc_i+1)  is  not  feasible;  thus  its  finish  time  is  oo.  The  con  ition 
/!(.-i;i)e(xi)  <  /..-i.)  -tisfied.  So  =  .(1:  -  1,;  -  D  6  (T/).  This  justifies  the 

theorem.  •  o  t  n 

Case  II:  Uk=  Uk-y.  The  reasoning  follows  the  discussion  of  the  first  part  in  Case  1. 
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3  Scheduling  a  Task  Set 

In  this  section  we  discuss  how  to  schedule  a  task  set  by  using  Sequence-Scheduler.  The 
opUmol  schedule j  pj  of  a  task  set  is  defined  as  foDows:  for  any  feasible  sequence  S  consisting 
of  tasks  in  F,  we  have  either 

1^1  <  Ipi, 


or 


i5|  =  |/?1  and  fs  >  fp. 

For  simplicity,  we  use  optimal  schedule  to  represent  the  optimal  schedule  of  the  task 
set,  when  there  is  no  confusion.  Note  that  the  optimal  schedule  of  the  task  set  is  the  lest 
one  of  the  optimal  schedules  of  all  instances  in  the  task  set.  Erschler  tt  al  [2]  proposed  the 
dominance  concept  to  reduce  the  nrmiber  of  permutations  that  should  be  examined  for  the 
feasibihty  test  of  a  task  set.  Yuan  and  Agrawala  [18]  proposed  the  super  sequence  to  further 
reduce  the  search  space  for  testing  the  feasibility  of  a  task  set.  In  this  section,  we  show  that 
for  our  optimization  problem,  the  super  sequence  provides  a  valid  and  pruned  search  space. 
In  other  words,  there  exists  one  optimal  schedule  which  conforms  to  an  instance  in  the  super 
sequence  of  the  task  set.  Thus  we  may  use  Sequence-Scheduler  to  schedule  for  the 
extracted  from  the  super  sequence  to  derive  the  optimal  schedule.  There  mav  exist  more 
than  one  optimal  schedule  for  a  task  set.  Our  interest  is  on  how  to  derive  one  of  them. 

3.1  Super  Sequence 

Temporal  relations  between  two  tasks  Ti  and  Tj  axe  summarized  in  the  following.  They  are 
illustrated  by  Figure  3. 

•  leading  .  Tlr  -<  T],’,  if  r,-  <  r^,  d;  <  but  both  of  the  equalities  do  not  hold  at  the  same 
time. 

•  matching  ;  Ti  1|  Tj,  if  r,-  =  Tj,  di  =  dj. 

•  containing  :  T{  U  Tj,  if  r,-  <  Tj,  di  >  dj. 
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Figure  3:  (i)  K  X  T,-;  (ii)  K  II  Tf,  (m)  T>  U  Tj 


A  taek  h  is  called  a  top  test  if  there  is  no  task  contained  by  h.  A  task  is  ^  ^ 

task  if  it  contains  at  least  one  task.  Assume  that  we  have  1  top  tasks  m  the  t^  •  ^  ,  d  d 
Tt  h  h  respectivelv.  Denote  by  M,  the  set  of  tasks  that  contain  the  top  tok  hi, 

thatT,isweaHyi«.din5tori.denotedbyi;<iri.lfIl^TiOrril|r,.Ifr,  i 

Tht!o^Lt“i’c^f  is  originally  developed  by  Erschler  ei  cl  P]  to  reduce  the  se.^ 
so  Jet  testing  the  feasibility  of  a  task  set.  Fhe  idea  is  ent^ded  withjhe  super  seo.uence 
proposed  by  Yuan  and  Agrawala  jlS].  An  instance  !  dominates  an  instance  . 

r  feasible  =?  I  feasible. 

It  can  be  considered  that  J  is  a  better  candidate  as  a  feasible  ^  ^ 

dominant  instance  is  an  instance  such  that  for  each  possible  instance  7  oUhe  t^Wet.^^ 
J  dominates  the  dominant  instance,  then  the  dominant  mrtance 
dominant  instance  can  be  considered  as  the  best  candidate  of  the  ^ 

ins-ances  is  said  to  be  a  dominant  set.  if  7  does  not  belong  to  the  dominant  set,  then  the 
Jinant  instance  in  the  dominant  set  such  that  the  dominant  instance  dominates 

I. 
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A  super  sequence  A  serves  similaxl}'  as  a  dominant  set  in  that  there  exists  a  dominant 
instance  in  the  super  sequence;  and  it  is  more  appropriate  for  solving  our  problem.  A  super 
sequence  is  a  sequence  of  tasks,  where  duplicates  of  tasks  axe  allowed.  The  purpose  is  to 
extract  instances  from  the  super  sequence  for  scheduling.  The  super  sequence  is  constructed 
according  to  the  dominant  rules  [2, 18]  described  below.  Whenever  a  task  satisfies  one  of  the 
conditions  specified  by  the  rules,  a  duplicate  of  the  task  is  inserted  into  the  super  sequence. 
Note  that  duplicates  can  only  be  generated  for  nontop  tasks.  The  top  tasks  appear  once  and 
only  once  in  the  super  sequence. 

Rule  Rl;  Let  To  and  T/j  be  any  two  top  tasks.  UTa  -<  Tp,  then  To  is  positioned  before  Tp. 
If  To  II  2]g,  the  order  of  the  two  top  tasks  is  determined  arbitrarily. 

A  unique  order  of  the  top  tasks  can  be  thus  determined  for  the  super  sequence.  Let  us 
denote  the  sequence  composed  of  the  top  tasks  by  if  =  {ki,h2, . . .  ,ht).  The  rule  imphes 
that  if  To  is  positioned  before  Tp  in  the  super  sequence,  then  Ta<Tp.  So  /ii  <  Aj  <3  . . .  <  Aj. 

Rule  R2: 

(1)  A  nontop  task  can  be  positioned  before  the  first  top  task  Ai  only  w'hen  it  contains  Aj. 

(2)  A  nontop  task  can  be  positioned  after  the  last  top  task  A,  only  when  it  contains  Aj. 

A  nontop  task  can  be  positioned  between  hk  and  only  when  it  contains  Ajt  or  . 

The  i  top  tasks  delimit  the  super  sequence  into  t  +  1  regions  by  rule  Rl.  Now  we  have 
i  -r  1  subsets  of  nontop  tasks  separated  by  the  i  top  tasks  by  rule  R2.  Generally  speaking, 
a  nontop  task  has  more  than  one  possible  location.  Denote  the  Ath  subset  by  A*,  which  is 
between  top  tasks  A*  and  hk+i-  From  rule  R2,  it  ran  be  deduced  that 

■A*  =  U  Bk,k^^■•i  U  R;t,i+i  5  where  (5) 

^k,k+i  —  n  Mk^i,  Bk,k+i  —  n  Mk+i,  =  Mk  n  Mk+y- 

Next  rule  is  to  specify  the  order  of  the  tasks  within  each  subset. 
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Rule  R3;  In  each  subset  Ak,  for  /:  =  0, 1, . . .  ,n, 

(1)  the  toks  in  B,;cFi  nnnonJi”*  deadlines,  and  tasks  with  the  same 

deadlines  are  ordered  airbitraiily, 

(2)  the  tasks  in  ordered  arbitrarily, 

(3)  the  tasks  in  Bi,,„  are  ordered  according  to  their  ready  times,  and  tasks  wrth  the  same 

ready  times  are  ordered  arbitrarily,  ^  •  -  j 

(4)  the  tasks  in  are  positioned  before  those  in  BkM^^  >  which  in  turn  axe  positione 

before  those  in 

Now  -we  are  ready  to  construct  the  super  sequence  with  these  three  rules.  Top  tasks  are 
first  picked  out  and  ordered,  forming  t  +  1  regions.  In  each  region,  there  is  a  subsequence 
of  nontop  tasks.  An  instance  extracted  out  of  the  super  sequence  is  one  that  coi^orms  to 
the  super  sequence  without  duplication  of  tasks.  Let  q  be  the  number  of  top  tasks  that  a 
nontop  task  contains.  The  number  of  possible  regions  the  nontop  task  can  fall  into  is  g  +  1. 
The  number  of  instances  in  the  super  sequence  thus  sums  up  to 


N  =  11(9^^)"'’ 

where  n,  is  the  number  of  nontop  tasks  which  contains  q  top  tasks.  Compared  with 
an  exhaustive  search  which  takes  up  to  n!  instances  (permutations)  into  account,  the  super 
seauence  generally  leads  to  a  smaller  set.  Notice  that  it  takes  0{n)  time  to  check  if  an 
instance  is  feasible.  Hence,  the  time  complexity  of  the  feasibility  test  for  the  task  set  is 

0(A  •  n). 


3.2  Leading  Theorem 

The  super  seauence  is  not  only  useful  in  testing  the  feasibihty  of  a  task  set;  we  will  show  that 
it  is  also  useful  in  reducing  the  number  of  instances  to  be  examined  in  order  to  obtain  the 
ootimal  schedule  of  a  task  set.  We  will  show  that  there  exists  at  least  one  optimal  sdeoule 
which  conforms  to  an  instance  in  the  super  sequence  A.  Hence,  it  sufices  to  chec  •  t  oug 
A  to  obt^n  the  optimal  schedule  of  F . 
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It  is  wortli  attention  that  tlie  top  tasks  in  p  may  noi  be  the  same  top  tasks  of  F.  This 
arises  because  some  of  the  top  tasks  of  F  may  be  rejected,  introducing  new  top  tasks  in 
p.  Before  proceeding  to  verify  the  rules  for  the  super  sequence,  we  will  first  introduce  the 
Leading  Theorem.  It  serves  as  the  base  for  further  analysis  in  the  Dominance  Theorem  and 
Conformation  Theorem  to  be  described  later.  The  Leading  Theorem  tells  that  under  certain 
condition  we  can  adjust  the  order  of  tasks  to  satisfy  the  Weakly  Leading  Condition  to  be 
defined  below  and  do  not  introduce  a  schedule  with  greater  finish  time. 

Assume  that  5"  is  a  feasible  sequence,  with  Lpre-,L,  and  Lpost  subsequences  of  S  such  that 

^  ~  -^pre  Q  L  @  Lpogf. 


Let  us  denote  L  by 


L  =  (r^,2;„r.„...,7;„,Ta), 

where  >  0.  A  frame  F  is  defined  to  be  a  time  interval  characterized  by  a  beginning 
time  hp,  and  an  ending  time  ep.  We  say  that  F  is  a  frame  corresponding  to  L,ii  bp  =  sp, 
and  ep  =  fo,  where  sp  is  the  starting  time  of  Tp,  and  fo  is  the  finish  time  of  To- 

Theorem  2  (Leading  Theorem)  Assume  that  5  =  Lpre^L®  Lpost  is  a  feasible  sequence, 
where  X  =  {Tp,Tx^.,Tsj,...,Ts^.,Ta).  Let  F  be  a  frame  corresponding  to  L.  lITo-<Tp,  and 
there  does  noi,  exist  a  task  T-.,  1  <  :  <  u,  such  that  F  U  then  there  exists  a  sequence  L 
which  is  a  permutation  of  L  such  that 
(i)  {To,Tp)  conforms  to  X,  and 

JOOJt  • 


Before  we  can  proceed  to  prove  the  theorem,  the  following  definition  is  useful. 

Weakly  Leading  Condition:  a  sequence  5  =  (T/,  T/, . . . ,  )  satisfies  Weakly  Leading 

Condition  if  T/  <3  T/  <  . . .  <  Tjf . 

Lemma  3  Let  5  be  a  sequence  satisfying  Weakly  Leading  Condition.  H  conforms 

to  S  and  Ti  H  T^,  then  all  tasks  located  between  T,-  and  Tj  in  5  must  match  T,-  and  Tj. 
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Proof:  For  any  task  T,  located  between  T;  and  T,-,  according  to  the  definition  of  Weakly 
Leading  Condition,  we  have  r.-  <  r,,  <  r,-  and  d.-  <  4  <  Since  T;  \\  Tj,  -  r,  an 
d;  =  dj.  Therefore,  we  have  r,-  =  r*  =  rj  and  d;  =  d,  =  d;.  So,  Tx  matches  ,•  an  j. 

To  obtain  2,  let  us  modify  the  tasks  in  L  in  the  foUowing  way.  If  the  ready  time  of  a 
task  is  less  than  if,  then  its  ready  time  is  set  to  hr-  K  the  deadline  of  a  task  is  greats  t  an 
tr.  then  its  deadline  is  set  to  ef .  The  computation  times  remain  unchanged.  Let  £  be  a 
sequence  consisting  of  the  modified  tasks  with  the  same  order  of  £,  i.e., 

Since  n  ^  Ta,  ds  >  d.  >  /.  =  er.  So  d^  =  d^  =  e^.  Also  r,  <  r,  <  s,  =  ip,  so 
r'  —  r'p,  —  bp.  This  is  illnstrated  in  Fig  4  (ii). 

°  Note  that  swapping  and  T'  in  the  sequence  does  not  result  in  a  feasible  sequence  m 
this  example.  It  is  essentia]  that  we  adjust  the  order  of  the  tasks  located  between  them.  Let 
i'  be  a  sequence  which  is  a  permutation  of  V  and  satisfies  the  Weakly  Leaning  Conition, 
and  to  which  XX)  Furthermore,  V  satisfies  an  even  stronger  condition.  H  J, 

is  cositioned  before  tJ-'  in  then  Tf  <  Tf ;  h,  furthermore,  Ti  ||  T,  .  the  corresponoing 
tasks  TV  and  Ti'  satisfies  that  <1  Tj:' .  The  idea  of  such  arrangement  is  that  wnen 
interchanging  and  X  we  do  not  produce  a  new  reversed  pair  like  them.  By  r^ers^  P-r 
we  meaa  for  example  -<  Tp  is  positioned  before  m  the  sequence.  So,  u  (  ,•  ,,  ) 

conforms  to  l\  the  corresponding  tasks  satisfies  the  condition  that  either  T,-, 

T,^'  U  Tjt'  or  Tj;'  U  Tjr'.  One  possibility  of  L'  is  illustrated  in  Fig  4  (iii),  or 

The  existence  of  such  a  sequence  is  proved  later.  Finally,  I  can  be  a  sequence  ■mth 
the  same  order  of  I',  but  the  ready  times  and  deadlines  of  the  tasks  axe  recovered  to  their 
ori-inal  settings.  This  is  illustrated  in  Fig  4  (iv).  The  hgures  give  the  rough  idea  about  ow 
the" adjustment  of  task  order  can  be  made  to  satisf}^  the  conditions  described  in  the  Leading 
Theorem.  Here  below  is  the  proof  of  the  Leading  Theorem. 
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PKOor  (of  Leading  Tbeore^:  We  .ould  tet  sb^J  e  e^^tence 
,eady  times  and  deadlines  of  tbe  tasks  for  i  rs  done  rn  snob  a  way 
„e  It  aSected.  In  addition,  their  compntafon  trmes  remarn  tbe  same. 

is  feasible,  and 

/Lpr.®L' = /V.eir-  V  +  V 

We  can  obtain  V  iB  the  foUo^Bg  eists  because  there 

in  V  snch  that,  for  any  task  belonging  to  i  T,  ^  ^ibitiarily.  Tf 

are  no  containing  relations  among  the  mo  e  “  Continne  tbe  exchanging  process 

is  exchanged  with  tbe  task  locate  ]us  e  second  step,  tbe  second  task 

nntil  Tf  occnpies  tbe  first  location  in  tbe  '  ^‘^rtTl'  ex«;t  Tf'.  Xf  <  T. 

T}'  of  i'  is  tbe  task  in  I-  sncb  that,  to  -^.^^^^^ntifiroccnpies  the  second  location 
Exchange  with  its  left  neig  or  ^  ^33^. 

in  the  sequence.  At  the  ^  ^  Exchange  Tf  with  its  left  neighbor 

JL  belonging  to  V  except  Ti  roug  ,_i-.  ,  ^  seauence.  W^e  keep  performing  this 

task  consecutively  untilit  occupies^the it  oca.  ion  in  ^  ^  ^  ^  position  of 

operation  until  vre  finally  obtain  L  .  sertion  o  -  p  ^  belonging  to 

rri;T2.i  Sai  r  ““it. 

W'eakly  Leading  Condition.  t  ^ttstti a  all  tasks  located  between 

There isachance that  (Ts.Ta)  " any  afierence. 

and  Ta  must  match  ^  y  ,^lj5ch  makes  (Tp,  Tc)  conform  to  L' 

We  can  thus  exchange  the  position  o  p  “’r  rpi,'  i  <  ;  <  IL'l  T^‘  leads  to  or  matches 

P^ng  the  process  of  adiusting  the  ,,  Ee  described 

r.L“  ri”“  •.»  -« rf' » a. » ““ — 

with' a  shorter  or  equal  finish  time.  This  explains 


X  is  a  sequence  with  the  same  order  of  L\  but  the  ready  times  and  deadlines  of  the 
tasks  aie  recovered  to  their  original  values.  Each  task  in  L  can  be  started  no  later  than  the 
starting  time  of  the  same  task  in  L'.  Consequently, 

I LfTt®L  — 

By  Corollary  1,  we  have 


□ 


Lemma  4  Assume  that  S\  ©  SI  ©  (Tj)  ©  SZ  is  feasible,  where  Xl,  5’2,  and  SZ  are  sequences. 
If  Tj  <  52,  then  f sie{Tj)eS2®S3  <  /sae52e(r>)®53- 

Proof:  We  wiU  prove  the  theorem  by  induction  on  |S21.  When  |52|  =  0,  it  is  vacuously 
true.  Assume  that  it  is  true  w’hen  |52|  =  k.  We  would  like  to  show-  that  it  is  true  when 
1521  =  1:  ©  1.  Let  52  =  (21)  ©  52',  where  152']  =  fc;  i.e., 

51  ©  52  ©  {Tj)  ©  53  =  51  ©  (21)  ©  52'  ©  (Tj)  ©  53. 

We  can  view  51  ©  (Ti)  as  a  single  sequence,  and  because  1S2'(  =■  A,  by  induction  hypoth¬ 
esis,  we  have 

Jsie{Ti)e{T^)eS2'es3  <  fsie{Ti)esye{Tj)@sz- 
By  definition, 

fsie{Ti)e{Tj)  =  Tnax{max{fs,ri)  +  Ci,rj)  +  Cj 

=  maa:(/5  +  c,  -fCj,r,  -i-c,  -f  Cj,rj  +  Cj) 

Since  Tj  <  52,  which  indicates  that  Tj  <  T,-,  we  have  rj  <  r,-,  and  Tj  ©  Cj  <  r,-  +  £:,•  +  Cj. 
fsie{T,)e{Tj)  =  rnax{fs  +  a  .■  cy.r,-  4  c,-  +  Cj) 
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On  the  other  hand, 

f Sie{Tj)@{Ti)  —  7nai(Tnai(/s,  Tj)  +  Cj,ri)  T  Ci 

=  max{Ss  +  c, -f  Cj,  Tj  +  c;  +  Cj,  r,-  +  c,) 

Because  r,  +  c,-  +  Cj  <  r,-  +  Ci  +  Cj,  we  have 

f sie{Tj)e{Ti)  <  fsie{Ti)e(,Tj)- 
By  Corollary  1, 

fsie{Tj)e{T;)esr&sz  ^  Ssie{Ti)e{Tj)esvBS3- 
Therefore, 

fsie{Ts)eS2BS3  ^  fsieS2e{Ti)esz‘ 

□ 


3.3  DominaBce  Theorem 

The  super  sequence  is  constructed  for  the  feasibility  test  of  a  task  set.  H  a  task  set  is  feasible, 
we  say  that  there  exists  a  full  schedule  of  the  task  set.  There  may  exist  more  than  one  full 
schedule  for  a  given  task  set.  An  optimal  full  schedule  is  a  full  schedule  whose  finish  time  is 
shortest  among  all  the  possible  full  schedules.  Note  that  a  full  schedule  is  a  feasible  mstance. 
In  this  section,  we  prove  that  if  a  task  set  is  feasible,  there  exists  an  optimal  full  schedule 
conforming  to  the  super  sequence  '  Hence,  the  super  sequence  provides  a  valid  and  pruned 
search  space  for  deriving  the  optimal  full  schedule  of  a  task  set. 

'In  [2],  Erschler  ci  ol.  's  theorem  implied  a  similai  result:  if  a  task  set  is  feasible,  there  exists  a  full  schedule 
in  the  dominant  set.  Our  theorem  further  shows  that  there  exists  such'  a  full  schedule,  with  the  nuiomuin 
finish  time  among  all  full  schedules,  that  conforms  to  the  super  sequence.  We  prove  the  existence  of  such  an 
optimal  full  schedule  in  a  more  systematic  way. 
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Theorem  3  Assume  that  the  task  set  F  is  feasible  and  p  is  an  optimal  full  schedule  of  F. 
Let  To  and  Tp  be  two  top  tasks  of  p  such  that  To  -<  Tp.  If  {Tp^To)  conforms  to  p,  then  there 
exists  another  optimal  full  schedule  p'  such  that  (To,  Tp)  conforms  to  p'. 

Proof:  To  and  Tp  axe  two  top  tasks.  Let  F  be  a  frame  such  that  hp  =  sp  and  tp  =  ja- 
To  ■<  Tp  means  bp  =  sp  >  rp  >  Tq,  and  e.p  =  fo  <  do-  If  there  exists  a  task  Tx  such 
that  F  Li  Tx,  then  Tg  U  Tx  too.  This  contradicts  to  the  fact  that  Tq  is  a  top  task.  Hence 
F  cannot  contain  any  task.  By  the  Leading  Theorem,  there  exists  another  sequence  p'  such 
that  {To,Tp)  conforms  to  p',  and  both  \p'\  =  \p\  and  /p<  <  fp  hold,  which  means  p'  is  an 
optimal  full  schedule  too.  □ 

When  two  tasks  match  each  other,  it  dose  not  matter  which  task  is  executed  first.  This 
gives  rise  to  the  following  Corollary. 

Corollary  2  Assume  that  the  task  set  F  is  feasible  and  p  is  an  optimal  full  schedule  of  F. 
Also  assume  that  {hi, . . .  ,Tp,Ta, . . . ,  hi),  the  subsequence  of  the  top  tasks  in  p,  conforms  to  p. 
If  To  <  Tp,  then  there  exists  another  optimal  full  schedule  p'  such  that  (hi, ... ,  To,  Tp, ...  ,ht) 
conforms  to  p'. 

PROOF:  Theorem  3  holds  when  To  <1  Tp,  because  when  two  tasks  match  each  other,  the 
execution  order  of  the  two  tasks  is  axbit^aJy^  Also  by  looking  at  the  adjustment  process  of 
Leading  Theorem,  we  can  find  that  the  tasks  located  before  and  after  To  and  Tp  have  not 
been  adjusted.  This  verifies  the  coroDaxyt  □ 


Corollary  3  Let  H  =  (hi,  ^2, . . . ,  hj)  be  top  tasks  of  the  task  set  F  such  that  hi  <3  h2  <J 
. . .  <  hj.  If  F  is  feasible,  there  exists  an  optimal  full  schedule  p'  to  which  H  conforms. 

Proof:  Since  F  is  feasible,  there  exists  an  optimal  full  schedule  p.  Let  K  —  {ki,k2,. . .  ,Wl 
be  a  sequence  which  is  a  permutation  of  E  such  that  K  conforms  to  p.  We  would  like  to 
adjust  the  order  of  the  tasks  in  K  so  that  K  is  transformed  successively  into  H.  We  locate 
the  corresponding  task  of  h^  in  K ,  where  x  is  chosen  in  the  order  of  1  through  i,  and  adjust 
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„  „ 

to  tbe  sequence  M.  During  the  exists  an  optimal  full 

),„...,  fc.-i  =ue  in  positions  1, . . . ,  ■  J  Therefore,  there  exists  an 

schedule  to  u-hich  the  intermediate  resultant  s^uen 

optimal  full  schedule  p'  to  which  (hi ,  hr,  ■  •  ■ .  M  ““  °™  ’ 

,irtre::;ths“" 

Therefore  the  rule  R1  is  verified. 

Before  we  can  go  further,  the  Ts'mLt.) 

T,  a  nontop  task  of  a  sequence  S.  ^  p,ij  of  5  if  (T„  hi)  conforms  to 

conforms  to  S  and  T,  •<  hi-  ““  ^  J’  *’  number  of  disorder  pairs  m  £■ 

S  and  hi  T,.  The  disorder  dejree  of  S  is  detmeo 

such  that 

_ ,hfc-i)  conforms  to  Lyre, 

[hk^i,  ■  •  ■ -M)  conforms  to  Lj^si- 

Wle  have  the  foUor^S  prop=rries-.  ^  ,.  = 

(1)  if  T,  hi  and  I  =  (hi . -ia).  ^  conforms  to  i;  besides,  the 

L  eieir...  diat  i  is  a  permutation  of  L.  and  (J..  i) 

disorder  degree  of  p' is  less  than  th^  o^^  ^  p'  = 

Lei  Lt -“tL  i’is'a’permutation  of  i.  and  (hi.Ti)  conforms  to  h;  bendes. 

dimrder  degree  of  p' is  less  than  that  of  p 

PKOOr;  We  will  prove  (1)  first.  Let  F  be  located  between  hi  and 

tF  -  J=  <dz<  ^h-,- 
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Tx  such  that  F  U  Tu,.-,  then  hk  U  Tu,,.  too.  This  contradicts  to  the  fact  that  hu  is  a  top  taisk. 
The  condition  of  the  Leading  Theorem  is  satisfied.  Hence  there  exists  a  sequence  L  which 
is  a  permutation  of  L  such  that  {T*,  hk)  conforms  to  X  and  } Lfrr®LBLpo,t  —  Xj-  Therefore, 
Ljrre  ©  X  ©  Xpoji  is  also  an  optimal  full  schedule.  Now  let  us  look  at  Figure  4(iv).  This  is 
the  schedule  after  the  adjustment  process  of  the  Leading  Theorem  is  made.  For  the  tasks 
whose  deadlines  are  less  than  ejr,  they  all  lead  to  hk-  Note  that  the  disorder  is  a  relationship 
defined  between  a  nontop  task  and  a  top  task,  and  hk  is  the  only  top  task  in  the  frame 
F.  Therefore,  no  new  disorder  pairs  with  hk  are  introduced  among  these  tasks.  Similarly, 
for  the  tasks  whose  ready  times  are  greater  than  they  axe  all  led  by  hk-  Therefore,  no 
new  disorder  pairs  axe  introduced.  As  for  the  tasks  otherwise,  including  Tx  and  whose 
deadlines  axe  greater  than  or  equal  to  cp  and  ready  times  less  tham  or  equal  to  ,  they  can 
be  ordered  arbitrarily.  Hence,  we  can  position  Tx  before  and  remove  the  disorder  pairs, 
if  any,  in  these  tasks  by  rearranging  the  proper  orders  for  them.  Thus  the  disoreler  cJegree  of 
X  is  decremented  by  at  least  one.  So  the  disorder  degree  of  p'  is  less  than  that  of  p.  Property 
(2)  holds  for  the  same  reason.  □ 

Note  that  Tx  does  not  match  hk  or  ht+i;  otherwise  Tx  is  also  a  top  task,  which  contradicts 
OUT  assiimption. 

Theorem  5  Assume  that  the  task  set  T  is  feasible  and  /?  is  an  optimal  full  schedule  of  T. 
Let  hi  <  ^2  <3  •  •  •  <5  h;  be  top  tasks  of  p.  There  exists  an  optimal  full  schedule  p'  such  that 
(hi,  ^2,  •  •  •  1  h{)  conforms  to  /?',  and  for  any  nontop  task  Tx  such  that  {hlo  7^, conforms 
to  p\  either  T-  U  h*  or  Tx  LI  h^+i. 

Proof:  Assume  that  Tx  is  a  nontop  task  such  that  {hk^Tx,  ht+i)  conforms  to  p'.  If  Tx  does 
not  contain  hk  and  Tx  does  not  contain  hk+i,  then  either  Tx  -<  hk  or  h^^+i  •<  Tx-  Hence, 
either  {hk,Tx)  or  (Ti,  h^+i)  is  a  disorder  pair.  We  can  eliminate  it  through  Theorem  4,  and 
the  disorder  degree  is  decremented  by  at  least  one.  WTienever  there  is  a  disorder  pair  in  the 
schedule,  we  can  always  apply  Theorem  4  to  eliminate  it.  The  disorder  degree  is  decremented 
in  this  way  until  finally  reaching  zero.  Hence,  {hk,Tx,  hk+i)  conforming  to  p'  implies ib at  Ti 
is  not  leading  to  hk  and  h*+i  is  not  leading  to  T*.  The  only  possibilities  axe  either  Tx  U  h* 
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or  Tx  U  /ije+i  • 


□ 


Theorem  5  confirms  the  validity  of  rules  R1  and  R2. 

Theorem  6  (Dominance  Theorem)  If  a  tash  set  T  is  feasible,  there  exists  an  optima) 
full  schedule  p  such  that  p  conforms  to  the  super  sequence  of  T. 

PROOF:  In  Theorem  5,  we  verify  the  existence  of  the  optimal  full  schedule  such  that  the 
top  tasks  axe  ordered  according  to  their  weakly  leading  relations,  and  the  nontop  tasks 
axe  located  in  the  appropriate  subsets  between  top  tasks.  The  only  work  left  is  to  order 
the  nontop  tasks  in  each  subset.  The  adjustment  process  of  the  Leading  Theorem  can  be 
applied,  and  the  resultant  order  is  exactly  spedfied  by  rule  R3.  So  we  can  conclude  that 
there  exists  an  optimal  full  schedule  p  which  conforms  to  the  super  sequence.  ° 


3.4  Conformation  Theorem 

If  there  is  no  task  rejected  in  p,  there  exists  an  optimal  full  schedule  conforming  to  the 
super  sequence  of  P.  However,  if  T  is  not  feasible,  some  tasks  in  T  should  be  rejected.  The 
dominant  rules  axe  developed  based  on  the  assumption  that  no  task  is  rejected.  'ViTien  tasks 
axe  allowed  to  be  rejected,  the  situation  is  different.  The  issue  to  be  raised  is  whether  t  e 
decent  solution  for  feasibility  test  can  be  appHed  to  oui  optimization  problem.  Rememb^ 
that  by  optimization  we  mean  that  the  number  of  rejected  tasks  in  the  schedule  is  m^mi  ^ 
and  then  the  finish  time  of  the  schedule  is  also  minimized.  When  a  task  set  is  feasible,  tne 
optimal  schedule  is  also  the  optimal  full  schedule.  The  difficulties  axe  addressed  in  the  next 
section,  followed  by  the  approach  and  proof  to  sohung  the  difficulties. 

3.4.1.  Difficulties 

We  wish  to  make  use  of  the  super  sequence  as  search  space  in  our  scheduling  problem.  The 

difficulties  axe  twofold.  _  _ 

First,  when  a  task  is  allowed  to  be  rejected,  the  dominant  rules  specifying  the  relations 

among  containing  tasks  and  contained  tasks  need  to  be  modified,  because  the  rules  axe 
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T2 

Ti 


Figure  5.  The  optiraal  schedule  may  not  conform  to  the  super  sequence. 

developed  based  on  the  assumption  that  no  task  is  to  be  rejected.  The  new  rules  <~an 
become  quite  complicated.  Let’s  look  at  the  example  depicted  in  Figure  5.  Assume  that  the 
task  set  is 

T  =  {Ty,T^,Tz,T„Ts], 

and  the  super  sequence  of  the  task  set  is 

A  =  (ri,r2,r3,T4,r2,r3,ri,T5). 

The  top  tasks  are  typed  in  bold  letters  for  emphasis.  F  is  not  feasible.  We  r-a-n  see  that 
one  possibility  of  the  optimal  schedule  could  be 

Po  =  {TitTi^Tz^Ts). 

Apparently,  po  does  not  conform  to  A.  One  ma}'  be  able  to  show  that  another  optimal 
schedule  (TjjTltjTs,  Ts)  conforms  to  A.  HovA,»?vgi^  given  an  arbitrary  task  set,  it  is  not 
guaranteed  that  one  is  always  able  to  do  so.  In  the  example,  T4  is  rejected.  If  we  recompute 
the  super  sequence  without  Ti,  we  would  get  a  different  super  sequence.  The  new  super 
sequence  would  be 

Ao=(ra,T2,ri,T3,r:,T5), 

to  which  Po  conforms.  This  gives  a  gnsat  difnculty.  It  seems  that  we  need  to  check 
against  each  task.  Construct  a  super  sequence  in  condition  that  the  task  is  acceptec^,  and 
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■  „4  In  eenetal,  we  need 

.VM  the  tash  is  lejectea.  -ox^sidering 

•  - -iri 

leiectiBg  a  ^°^top  tasks  or  sax^e  example  iB 

,1  ^  nVication  and  positions  totally  different.  lx>°  makes  A-o 

■“““.  ~  i.  -  ».».>•■■  n  ’ 

top  ta^Ks.  ^  ^  xesmts  iB  — .  ptoce- 

=.  THe  Tt\tct\orx  oi  j^a  A\fficulties.  y 

would  be  descnbed  and 


je  vfonio  oc  - - 

T'  gT>d.  /^ 

..  •*•  •-  tTi  -  “:c 

Iiigiual  task  se.,  an  .  ^  ajiinown  to  ’  ^  saedule,  and  “ 

its  an  optlnsal  sched^e^^  (inB) 

axe  also  u^o«  “  ^  ^  of  I  Is  “i,,tl»al  full 

seqnence  o  o  _  ^  that  the  nnkno  'pheorem,  there  to 

aiSerent  ixorn  ^  .feasible,  hy  the  Domin  pxohlem  is  that  we  ppltniig 

"^“S.  a- ‘•“.’S'XiI. ^  »»“*■". 

corapnte  po  svrappit^S  ^  ^  ^^ressaiy,  so  as  to  ti  instance 

-  r'^s":  Id  to  -  of  - --:rertre  s^eaule 

ieduie  p  suck  tUt  p  Is^o  “  ^  apace  when  sdxedullns  r . 

IT  w  'the  sake  of  simpBcny,  nse  A  «  a  ® 

oi  A-  So  we  can  nse 
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This  example  is  so  simplified  that  the  existence  of  p  can  be  verified  by  mere 
However,  the  reasoning  is  far  more  complicated  than  it  appeals  at  the  first  glan^.  We  are 
going  to  prove  in  the  foUowing  theorem  that  such  an  optimal  schedule  p  that  conforms  to  A 
always  exists.  The  corresponding  lemmas  are  presen  .  :d  in  the  next  section. 

Theorem  7  (Conformation  Theorem)  Given  a  task  set  T  =  {Ti,T2,...,r„),  there  exists 
an  optimal  schedule  p  such  that  p  conforms  to  the  super  sequence  A  of  T. 

PROOF;  Given  any  task  set  T,  there  exists  at  least  one  optimal  schedule,  which  is  unknoiro 
to  us.  Assume  that  we  need  to  reject  u,  tasks  from  T  to  make  a  feasible  schedule.  Ut  To  be 
the  task  set  which  is  composed  of  the  tasks  in  the  unknown  optimal  schedule.  To  is  a  subset 
of  r  The  super  sequence  of  To  is  denoted  by  Ae.  In  addition,  we  use  T,-,  »  ^  1  “ 

represent  a  task  set  derived  by  adding  j  tasks  into  Ta,  A,-  the  super  sequence  of  T,,  and  p, 
an  optimal  schedule  of  Tj.  When  we  say  adding  j  tasks  into  To.  we  mean  that  the  r^  an 
task  set  r,  is  composed  of  distinct  tasks  and  T,  is  a  subset  of  T.  In  particdar.  T  n 
We  wiB  prove  by  induction  on  u,  to  show  that  there  exists  an  optimal  schedule  p„  f 

“”B™ster>«^rO:  there  is  no  task  rejected.  T  =  To.  Since  T  is  feasible,  by  the  Domnance 
Theorem,  there  exists  an  optimal  (full)  schedule  pp  for  F  such  that  po  comonns  fo  As. 

Induction  hypothesis:  assume  that  the  theorem  holds  when  u.  =  j,  i.e.,  |pu|  -  n  J-  ^ 
the  task  set  Fj  which  is  derived  by  adding  j  tasks  into  Fo,  there  exists  an  epkimal  sc.,eOuie 
u  for  F  such  that  p;  conforms  to  A,-.  Notice  that  |pjl  =  Ipol,  and  IFjl  —  |  ot  +  J- 

Now  consider  the  case  when  u,  =  J  +  1,  i.e.,  Ipol  =  "  -  0  +  D-  We  need  to  reject  j  w  1 
tasks  to  make  a  feasible  schedule.  There  exists  an  optimal  schedule  pj  for  F  confomg  o 
Aj  bv  induction  hypothesis.  We  want  to  show  that,  by  swapping  and  r^lamg  the^ks 
in  pi.  the  resultant  sequence  pj4i  conforms  to  A,«;  besides.  Ip, .411  =  Ipjl,  “Y;'*’-”.  v 
whii  imphes  that  p,4i  is  also  an  optimal  schedule  for  F.  Let  T.  be  the  ^k  ^ded  mto  F,- 
to  make  Fj41  .  So,  Fj  U  {TJ  =  F,.4i  ■  There  are  two  possibihties  when 

If  T-  is  a  nontop  task  of  Fj4i,  adding  T,  does  not  add  a  top  task  into  F,-.  The  orders 
of  the  fop  tasks  in  both  Aj4i  and  Aj  derived  through  rule  lU  are  exactly  the  same.  Me 
112  specifies  the  relation  between  a  nontop  task  and  a  top  task.  Adding  a  nontop  as 
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does  not  a«ect  the  relations  between  the  already  existent  nontop  tasks  and  top  tasks.  The 
positions  (dnplicates)  of  the  already  existent  nontop  tasks  in  Aj  are  preserved  .n  Aj+,.  Rule 
R3  specifies  how  to  arrange  the  order  of  the  nontop  tasks  within  each  subset.  Agarn  ading 
a  nontop  task  T.  does  not  alter  the  orders  of  the  already  existent  nontop  tasks  in  each  subset 
in  Ai.  Therefore,  if  the  task  being  added  is  a  nontop  task,  A;  is  a  subsequence  of  Aj+n  Let 
US  look  at  the  example  in  Figure  5.  Assume  that  F,-  and  Fj+i  are 


Tj  =  {T2,T3,T4,T,),  and 
Fj+I  =  {Ti,T2,T2,T4,Ts], 

where  Fi  is  a  nontop  task.  The  corresponding  super  sequences  would  be 


Aj=  (r2,r3,T4rF2,r3,T5),  and 

=  (T„T2,r3,T4,r2,T3,r„T5). 


We  can  see  in  the  example  how  Aj  conforms  to  Aj+i- 

Otherwise,  T,  is  a  top  task  of  r,«.  T.  does  not  contain  other  tasks  in  Ti+i.  Two 

situations  are  possible.  .  _  . 

(i)  T  is  not  contained  bv  other  tasks.  The  number  of  top  tasks  in  r,>,  is  one  more  than 

that  of  'the  ton  tasks  in  T^.  The  order  of  the  top  tasks  in  A,  is  preserved  in  A,«,  since 
the  relations  of  the  top  tasks  axe  not  altered  by  adding  T..  Furthermore,  T.  dc«  not  alter 
anv  existent  orders  among  the  nontop  tasks  and  top  tasks,  or  among  the  orders 
nontop  tasks  and  nontop  tasks,  specified  by  rules  R2  and  E3,  respectively.  The^ore  A;  is 
a  subsequence  of  A,«.  Let  us  look  at  the  example  in  Figure  6.  Assume  that  F;  and  1  i+i 


axe 


Fi4a  =  {Fi,r2,r3,r4,r5}, 

where  Fj  is  a  top  task  not  contained  by  other  tasks.  The  corresponding  super  sequences 
would  be 

Aj  =  (ri,T2,ri,r,rT5,F0,  and 
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Figure  6:  The  added  top  task  Ts  is  not  contained  by  other  tasks. 


A^+j  =  (Ti,T2,ra,T'3,r4,T5,r4). 


We  can  see  in  the  example  how  Aj  conforms  to  Aj+i. 

(ii)  Ti  is  contained  by  some  top  tasks  and/or  nontop  tasks  of  Fj.  Let  the  top  tasks  of 
Fj  containing  Tx  be  pi, ...  ,5m,  indexed  in  the  weakly  leading  order.  This  situation  is  more 
complicated,  because  p,-,  i  =  1, . . . ,  m,  turn  out  to  be  nontop  tasks  in  Fj+i.  There  exists  a 
total  ordering  of  them  by  weakly  leading  relations,  because  there  is  no  containing  relations 
among  p,-.  By  rule  Rl,  the  super  sequence  of  F,-+i  can  be  expressed  as 


Aj+i  =  •  •  •  ,5i,  •  •  •55m,  •  •  •  5^*,  •  •  •  ,51)  •  •  •  ,5™,  -  •  •5^fc+i5  -  •  •)5 

where  h,...,  /li+i, .  -  -  are  the  top  tasks  in  F,-+i,  and  in  particular,  h  represents 

Tg.  By  rule  R3,  the  super  sequence  of  Fj+i  can  also  be  expresseci  as 

Aj+i  =  (. . . ,  ^*5  -Bfc.irT’  -Sx.i+i,  •  •  ■}> 

>  .  ^ 

^;45 

— 


where  represents  the  subsequence  of  Aj^-i  between  and  excluding  hk^i 

and  hk+i,  as  depicted  above,  and  =  B*_ijen;+i©Bx,i+i,  where  ©  means  concatenation 
of  sequences.  Remember  that  pi, . . .  ,pm  are  top  tasks  of  Tj.  All  the  top  tasks  in  Fj  are  in 
the  order  of  /ii, . . . ,  hk-u9u  •  •  •  5Pm,  ^i+i,  •  •  •  by  the  weakly  leading  relations.  By  rule  Rl, 
the  super  sequence  of  Tj  can  be  expressed  as 

> - - - — ^ 
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f  A  between  ).t-a  h^,  excluding 

where  Clj  represents  the  subsequence  ,  subsequences  before  h-i 

b..r.  us  depicted  nbove,  Notice  tbnt  .n  top  tush  b.  or  T. 

of  both  Ai«  end  A;  are  exactly  the  same  gi^lslly,  the  subsequences  after  ba« 

1  inLrof  A,  only  in  the  the  i„,,,aiately  after  ha-,.  By 

Now  we  would  bke  to  cbed  what  ‘usks  m  >  _  dieck  the  tasks 

Lemma  7,  aU  the  tasks  in  can  be  not  contain  p>.  Because 

in  ni«  -  h-  If  °J;  3jso  contain  h,.  When  constructing  the 

p,  contains  ba,  any  task  wbrcb  con  aans  sr  ^  of 

subsequence  of  A,-  between  ba-a  and  Sa,  subsequence  as 

A^+a  should  foBow  immediately  after  ba-i.  and  by 

exactly  the  same  as  but  do  not  contain  gu  so  they 

One  may  observe  that  some  tasks  in  a-i,t  tasks  of 

would  also  be  positioned  betwMU  a-i  an  deadnnes  than  those 

This  is  because  they  do  -t- ^  B, a,,  of  A^v,  should  be 

tasks’  in  Ba.r,^  ^or  .Iructing  A.  and  the  order  is  the  same.  Hence, 

located  immediately  before  wnen 

the  A,-  can  be  further  expressed  as 


,  t  D  _  ...Oi _ .5mt— 


j 

D  anfl  B-  of  A,-,  excluding 

where  Qi  represents  q/a^lhe  tasks  in  fi;  are  either 

and  B-^  i+i-  We  have  ^  ® 

in  or  in  Bi=i.fc-  _  F;,niTe  7  The  task  set  in  the  hgnie  is  Ti+i-  And  r^+i  - 

Let  us  look  at  an  example  m  o  „  ^  classified  as  nontop  tasks  in  r,>i, 

would  be  r,.  51  -nd  5:  contain  only  h,.  So  51  and  51 
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Figure  7:  The  added  top  task  k)^  is  contained  by  other  tasks. 


and  as  top  tasks  in  Fj.  We  can  compute  the  super  sequences  as  foDows. 


^2,73, r4,hk»i,  T4^  . r2^  Zs, Ti, Pi , g2 ?  h^, ? 92^  F3,  Tj  ,  Ts 


-Bjc.k+:  ^I,k+2 


Now  going  back  to  examine  Equations  6  and  8,  we  can  find  that  Aj  and  Aj-ui  only  differ  in 
the  middle  subsequences  represented  by  Qj  and  This  can  also  be  seen  in  the  example  in 
Figure  /.  The  instances  extracted  from  Ay  would  conform  to  Ay^.^  except  the  corresponding 
middle  subsequence  mentioned  above.  Remember  that  py  conforms  to  Ay.  We  would  try  to 
adjust  the  order  of  the  tasks  of  the  subsequence  in  pj  which  correspond  to  fly  for  the  purpose 
that  the  resultant  schedule  ^y+i  conforms  to  Ay-j.^  and  py+i  is  adso  an  optimal  schedule  of 
T.  The  adjustment  procedure,  called  the  swapping  and  replacing  method,  applied  to  pj  is 
described  below^: 
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<d  they  exe  sorted  by  their  ready  times  r,. 

Vc  T  G  such  that  fy  ^  ^hk  > 

Cl:  for  all  tasks  r,  ,  they  are  sorted  by  their  deadlii.es  d. 

Cl:  ,or  all  tasks  T.  S  Ut  such  tha  s. 

,  +  rA  Kv  n  OX  C2  described  a-bove  ii 

C3:  a  task  can  be  sorted  y  j  f  >  da  T,  is  replaced  by  hr- 

,  ,  1-  g  nt  such  that  s,  <  ra.  and  /r  >  a.' 

C4:  if  there  exusts  a  task  T.  €  ,  ^ 

r  Lt:;: s  rr;:^ 

A,,  the  head  and  the  t.l  of  ,,  ,,  00^0^3  to 

middle  subsequ»«rf«4.^.^^  “d  replaong  ^  ,,  the  tasks  in 

issks  adjuste  1  °  ,  r3  so  K  does  not  matter  v,hich  task 

ty  Lemma  ^  uncording  to  rule  R3,  so  >  ^,od  by  therr 

Bk-y,k  can  be  detenru  order  of  tbe  tasks  in  B,.^,k  a.ocoTding  to 

is  located  before  which.  -  n  ■  jf  ’  qualifying  tasks  are  or 

TeLy  times.  During  the  adjustment  of  d.  ^  ^  B..,,a  axe  tes  thm  the 

reaai  ^  times  ui  tbe  tasks  oi  m-i,k 

thai  ready  times.  ^  ^  resultant  sche  ^  ^ ^  ^  tasks  in 

readv  times  of  the  tasks  in  S,.r.k-  This  indicates  that  the  order  oi  the 

are  positioned  before  the  tasks  of  ^o^on,  the  adjus  m 

^  C^  COUfOTinS  to  if  such  a.  iy 

the  adiustment  of  Oi  coiuwx  r  a  • .,  In  condition  04,  u  ^ 

ra'rfa^s  the  order  of  the  snapped  tasks  ^  i.  Equation  6.  Ea^ 

Ls,  replacing  T..  hy  hr  also  "  “  U.  Hence.,  all  the  tasks  in  the  rmddl 

t  e  n;  "rr:-  -  - 

subsequence  of  pj^i  to  A -+1  can  also  he 

conforms  to  A j+i-  +v  +  o-  ,  in  addition  to  conforming  j  ’  ,■> 

u  la-e  to  show  that  pj+i,  ,  ..  .  qi  as  having  the  same 

1,-ow  we  would  hke  to  condition  01  ^ 

foiished  no  later  than  p,-  ^  ^  hdort  this  time  ms  an  .  ^;^^ch 

^urrual  deadlines  of  deadlines  hy  r«ultant 

ordering  among  the  task  trme 

is  achieved  by  sorting  therr  leaoj  trm  • 
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schedule  after  the  adjustment  of  Cl  would  not  be  greater  than  that  of  the  original  schedule. 
Similarly,  the  tasks  satisfying  condition  C2  can  be  viewed  as  having  the  same  virtual  ready 
times  of  ,  because  they  all  start  after  this  time  instant.  For  the  scime  reason,  the  finish 
time  of  the  resultant  schedule  after  the  adjustment  cf  C2  would  not  be  greater  than  that  of 
the  original  schedule.  In  condition  C3,  the  qualifying  tasks  can  be  sorted  in  either  way  and 
does  not  affect  the  resxilt.  In  condition  C4,  if  there  exists  a  task  Ty  whose  computation  time 
covers  the  whole  window  of  the  rejected  task  h*,  we  may  as  well  replace  Ty  by  and  the 
firush  time  of  the  resultant  schedule  after  the  adjustment  of  C4  would  not  be  greater  than 
that  of  the  original  schedule.  Each  task  Ty  €  satisfies  one  of  the  conditions  by  Lemma  11. 
Hence,  all  the  tasks  in  the  middle  subsequence  of  pj+i  are  adjusted  in  such  a  way  that  pjj^x 
would  be  finished  no  later  than  pj.  Therefore,  |pj+i|  =  |p;l,  and  <  fp-.  Since  pj  is  an 
optimal  schedule  of  F,  Pj+i  is  also  an  optimal  schedule  of  F. 

How  the  adjustment  procedure  malces  the  finish  time  shorter  is  illustrated  by  Figures  8. 
In  Figure  8(i),  both  Ti  and  T3  satisfy  condition  C3,  and  T2  satisfies  condition  Cl.  The 
procedure  of  Cl  is  applied  to  all  these  three  tasks  and  makes  the  finish  time  shorter.  The 
dotted  task  window  frame  in  the  figure  indicates  that  A*  is  a  rejected  task.  In  Figure  8(ii), 
Cl  is  applied  to  the  qualifying  taisks  Zj  and  Tj.  And  by  C4,  T3  is  replaced  by  hk-  This, 
makes  the  finish  time  shorter.  While  it  is  hk  that  is  rejected  before  the  adjustment,  it  turns 
out  that  Tz,  whose  computation  time  covers  the  whole  window  of  hk,  is  rejected  after  the 
adjustment. 

So  far,  we  have  shown  that  pjj^i  conforms  lo  Aj4i,  and  that  pj+i  is  also  an  optimal 
schedule  of  F.  The  theorem  is  thus  verified  by  the  induction.  It  deserves  our  attention  that 
we  do  not  really  apply  the  swapping  and  replacing  procedure  to  any  schedule.  We  just  want 
to  show  the  existence  of  the  optimal  schedule  which  is  pj+i  in  the  context.  To  make  it  clear, 
the  structure  of  the  theorem  is  illustrated  in  Figure  9.  O 

3.4.3  Corresponding  Lemmas 

The  lemmas  used  by  the  conformation  theorem  are  demonstrated  as  foUowrs 
Lemma  5  D  Bk-i,k  =  Bk-i,k  n  ^  =  0 
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Figure  8;  The  swa.ppmg  and  replacing  procedure  Cl,  C3  and  C4. 


^  ^  “  ^k.k^i  ^ 


Proof;  We  first  sho^.  that  n  B,.^.k  =  0-  Given  any  task  T,  €  not 

contain  hk  by  definition.  Hence,  T,  does  not  belong  to  Bk-i.k.  So  n  Bk-u  -  0-  The 

others  can  be  proved  similarly. 


Lemma  6  Bk~i.k  U  Hprr.jt  —  ^  ^k,k-Ti- 

Proof:  We  first  prove  that,  if  a  task  Ty  €  Bk-i,k  U  then  Ty  €  U  Bk,k+y‘ 

r,  contains  by  definition,  T,  must  have  a  location  after  hr  too  by  mle  M. 
_  U  Bi,r«  U  Bj  J.J, .  Ty  does  not  belong  to  ,  so  T,  €  Br,I57  U  Bk,i+!  •  e 

cLi  proT similarly  that  'if  a  task  T,  €  U  ^  -S*-*'*  ^ 

Bk-i,k  U  Bjzi,k  =  • 
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Dominance  Theorem  Dominant  Rules 

Po  < - - -  Ao  { - 


3  an  optimal  schedule 

i 


Swapping  Sz  i 
Replacing  vj/ 


Adding 
..  a  Task 


i 

P5 


conforming  to 


Swapping  &  i 
Replacing  <4^  conforming  to 


Dominant  Rules 

-  Fj 

Induction  Hypothesis 


induction  for  ;  +  1  ^  ^ 


_  .  _  ,  I  Adding 

Dominant  Rules  d,  a  Task 

- r,-+a 


P  -4  g 
p">  g 


q  is  derived  from  p  by  metbod  r 
p  is  related  to  q  by  relation  r 


Figure  9:  The  Structure  of  the  Conformation  Theorem 

Lemma  7  If  and  only  u  T,  €  then  €  H,-. 

Proof:  We  will  first  prove  the  ^if”  part.  By  Lemma  6,  So 

we  only  need  to  check  against  j  U  Bk-i,k  U  Rjn,*  U  Rx,i+i  •  H  T^  €  p  or  e  Bk-i,k , 
then  Ty  has  a  location  between  the  top  tasks  hk-i  and  gi  in  Aj.  This  is  because  Ty  contains 
kk-u  Ty  has  a  location  after  hk-i  by  rule  R2.  So  Ty  €  flj.  Then  consider  Ty  €  Bjzi,k- 
is  either  a  top  task  or  a  nontop  task  in  A^.  If  Ty  is  a  top  task  in  A^,  Ty  must  be  one  of 
the  p,-,  i  =  1, . . .  ,7n  by  the  defimtion  of  p,-.  If  Ty  is  a  nontop  task  in  Ay,  it  must  contain  at 
least  one  top  task,  which  is  a  top  task  among  pi, . . .  ,p„,,  or  ht+j  by  referring  to  Equation  7. 
Notice  that  Ty  must  not  contain  hk—i  since  Ty  €  1^®  matter  whether  Ty  is  z.  top  or 

nontop  task  in  Ay,  Ty  has  a  location  in  fly  by  rule  R2.  If  Ty  €  then  Ty  has  a  location 

between  the  top  tasks  and  hk+i  in  Ay  by  rule  Pc2.  So  Ty  €  Hy. 
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Now  we  prove  the  ’only  if”  part.  Ut  Ty  be  a  task  located  in  If  T,  is  one  of  the  top 

tasks  of  Sr , ,  r,  contains  hy.  Either  T,  €  Bs-r,t  or  T,  €  Otherw.se.  Ty  .s  a  non. 

top  task  of  A,-.  By  rule  R2,  T,  contains  at  least  one  of  the  top  tasks  of  hk-i  -,9^,  **+i- 
H  Ty  contains  one  of  pi, ...  ,5m,  then  Ty  also  contains  h.  So  Ty  contains  either  h^-u  or  h, 
or  hk^,.  By  rule  R2,  Ty  should  fall  between  hk-,  and  hk,  and/or  between  hk  and  hk+^.  bo 

Ty  €  U  Bk-i,k  U  Bjzi,k ^ 

This  lemma  means  that  the  tasks  in  -  hk  axe  exactly  the  same  tasks  which  are  in 

Qj. 

Lemma  8  If  Ty  €  flj,  Ty  contains  hk- 

PROOr:  We  would  like  to  show  that  if  T,  does  not  contain  h,  then  Ty  does  not  belong  to 
Q'.  Since  p.,  i  =  1.. . .  ,m,  contains  ha,  that  Ty  does  not  contain  h  means  that  Ty  does 
not  contain  g:  either.  Hence  T,  can  not  have  a  location  in  the  subsequence  between  «.  and 
5^  in  -Q'..  The  only  possible  locations  of  T,  to  fall  in  fit  are  either  between  h.-i  and  p.  or 
between  p„  and  h».  If  T,  falls  between  h.,  and  p.,  that  T,  does  not  contain  p.  nnph^ 
that  Ty  contains  hs-.  by  rule  R2.  That  is  T,  €  A  nontop  task  cannot  have  dupbcate 

positions  in  the  same  regon  between  two  adjacent  top  tasks.  B,.,,i  is  located  betw^ 
and  p,  bv  Eouation  S.  T,  does  not  have  a  location  in  the  head  of  0'  before  p, .  For  the  same 
reason,  Ty  does  not  have  a  location  in  the  tail  of  fit  after  s„.  So  T,  does  not  belong  to 
Therefore,  -if  Ty  €  ,  Ty  contains  hk- 

Lemma  9  If  Ty  €  Ty  €  Bk-i,k  U  B-^j,  =  Bk,k+-i- 

Proof;  If  Ty  €  H  •,  then  Ty  €  By  Lemma  7,  Ty  €  U  Bk-i,k  U  BjZk,k  ^  ^kj^^ 

-r-  . .  r,  Ti7_  ♦!»,.+  T  />nTit.niTi<;  Ki.  bv  Lemma  8.  so  T„  €  Bk-i,k  U  •®fc>+i 


PROOF;  If  Ty  €  12^,  tnen  iy  iij.  -oy  «,  .ty  ^  -t-i.fc  ''  -  s-x.a 

Sn  U  Br .  ,  We  know  that  Ty  contains  hk  by  Lemma  8,  so  Ty  €  Bk-i,k  U  B-j^^k  ^  ^k, 

Bk  Jc+i  •  Also  by  Lemma  6,  we  have  Ty  €  Bk-i,k  U  B-^  k  ~  ^  *.^+1- 

Lemma  10  Assume  that  S  =  is  a  feasible  sequence,  where  L  =  {Ty,  ,T,„^. . ,  Ty.) 

H  there  exists  a  sequence  I  =  (T,, ,  r„, ... ,  T,. )  such  that  i  is  a  permutation  of  L  and  the 
tasks  of  I  are  ordered  by  the  weakly  leading  relation.  We  have  fi^,&LeLyo.,  -  Ji'P-ceieLyo,,- 
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Proof;  We  bubble  sort  tbe  tasks  of  L  iu  weaklj’  leading  order.  The  swapping  only  occurs 
between  two  adjacent  tasks.  For  each  swapping,  we  apply  the  Leading  Theorem  to  the  adja¬ 
cent  tasks,  which  correspond  to  Tp  and  To  respectively  in  the  theorem.  No  other  tasks  lie  in 
between  the  two  tasks  during  each  individual  swapping.  So  the  fiiiish  time  of  the  resultant 

schedule  is  not  greater  than  that  of  the  original  schedule  according  to  the  Leaxling  Theorem. 
□ 

Lemma  11  A  task  Ty  €  should  satisfy  one  of  the  conditions  Cl,  C2,  C3  or  C4. 

PROOF:  ]S  Ty  £  fi'-,  then  Ty  £  Bk-i,k  U  Lenima  9,  which  impBes  that  TyUhk.  We 

have  Ty  <  and  dy  >  There  are  four  possibilities. 

(i)  Sy  <  and  fy  >  dh^:  C4  is  satisfied. 

(ii)  Sy  >  and  fy  <  dh^:  C3  is  satisfied. 

(iii)  ^k),  and  fy  ^  ’  Cl  is  satisfied. 

(iv)  Sy  >  and  fy  >  dk^:  C2  is  satisfied. 

□ 


3.5  Set- Scheduler  Algorithm 

By  Conformation  Theorem,  we  have  shown  that  there  exists  an  optimal  schedule  which 
conforms  to  the  super  sequence  A.  Hence,  we  can  use  Sequence-Scheduler  to  schedule  for 
each  instance  in  the  super  sequence,  and  pick  up  the  best  one.  Since  Sequence- Scheduler 
obtains  the  optima]  schedule  for  each  instance,  we  end  up  with  the  optimal  schedule  for 
the  task  set.  The  algorithm  for  scheduling  a  task  set  is  given  in  Figure  10.  The  Sequence- 
Scheduler  takes  0{n^)  time  for  each  instance,  while  there  are 

=  n(9+ir’ 

9=1 

instances  to  check  in  the  super  sequence  as  illustrated  in  the  previous  section.  The  time 
complexity  of  Set-Scheduler  algorithm  is  thus  0{N  *  v?). 
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Algorithm  Set-Scheduler: 


Input:  a  task  set  F  =  {Ti,T2,— ,Tn} 

Output:  the  optimal  schedule  p  for  F 
compute  the  super  sequence  A  for  F 

P  :=  0 

for  each  instance  I  in  the  super  sequence  A 

invoke  Sequence- Scheduler  to  compute  the  optimal  schedule  a(n)  of  I 

(1p1  <  W{n)\) 

(1p1  =  kWl  fp  > 

p  :=  cr(n) 
endif 
endfor 


Figure  10:  Set-Scheduler  Algorithm 


4  Evaluation 

Experiments  are  conducted  to  compare  the  performance  of  Set-Scheduler  with  those  of  the 
well-known  Eailiest-Deadline-First  and  Least-Laxity-First  heuristic  algorithms.  The  rela¬ 
tions  among  the  tasks  are  important  for  the  schedulahility  of  the  tasks.  To  study  the  dii- 
ferences  between  different  cases,  we  allow  the  variation  of  the  computation  times,  and  the 
interarrival  times,  which  are  the  time  intervals  between  the  ready  times  oi  two  consecuti  e 
tasks.  Tasks  in  a  task  set  are  generated  in  non-descending  order  by  their  ready  times.  The 
parameters  of  the  experiments  are  random  variables  with  trunca.ted  normal  distribution,  as 
'shown  in  Figure  11.  H  the  computation  time  of  a  task  is  greater  than  its  window  length,  the 
computation  time  is  truncated  to  its  window  length.  Such  a  truncation  is  not  applied  to  the 
inter  arrival  times. 

The  mean  of  Window  is  ffxed.  Computation  time  ratio  is  the  ratio  of  the  computa.tion 
time  to  the  window  length.  The  mean  of  Interarrival  time  ranges  from  10%  to  I00%o  of  the 
mean  of  Window.  The  standard  deviation  of  these  three  random  variables  are  set  to  be 
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parameters 

mean 

Window  length 

10.0 

Computation  time  ratio 

0.25  0.5  0.75 

Interaxrivcd  time 

1.0,  2.0,  ...,  10.0 

Figure  11:  Parameters  of  the  experiments 


20%,  50%,  and  80%  of  their  means.  For  simplicity,  the  ratios  of  the  three  random  variables 
are  set  to  be  the  same  for  each  individual  experiment.  For  each  experiment  with  different 
parameters,  100  task  sets,  each  with  12  tasks,  are  generated  for  scheduling. 

We  compare  the  performance  of  these  algorithms  by  (1)  Percentage  of  accepted  tasks: 
the  number  of  accepted  tasks  by  the  algorithm  over  the  number  of  the  tasks  of  the  optimal 
schedule  by  exhaustive  search;  (2)  Success  ratio:  the  number  of  times  that  the  algorithm 
comes  up  wnth  an  optimal  schedule  in  the  100  task  sets;  and  (3)  Comparisons  per  task  set: 
the  number  of  comparisons  per  task  set  that  each  algorithm  takes.  When  interarrival  times 
are  small,  more  containing  relations  among  taisks  are  likely  to  happen.  Figure  12  shows  that 
the  heuristic  algorithms  perform  worse  under  this  condition  and  tend  to  reject  more  tasks, 
especially  when  the  computation  time  ratio  is  larger.  Set-Scheduler  always  reaches  100% 
acceptance  rate  since  it  is  an  optimal  scheduler.  In  the  figure,  because  the  characteristics 
of  the  data  with  different  standard  deviation  ratios  are  similar,  only  the  data  with  standard 
de^dation  ratio  equal  to  0.8  are  depicted.  When  success  ratio  is  concerned,  which  can  be  seen 
in  Figure  4,  the  heuristic  algorithms  performs  even  worse.  Generally  speaking,  the  heuristic 
algorithms  can  usually  produce  suboptimal  schedules,  but  fail  to  produce  the  optimal  ones 
most.  of  the  time.  The  search  space  is  shown  in  Figure  *'  4.  Set-Scheduler  performs  well  at 
the  expense  of  the  complexity,  w^hich  may  become  ver}"  large  when  the  interarrival  times  are 
small.  The  cost  is  more  reasonable  while  the  interarrival  times  between  tasks  are  not  too 
small. 
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Figure  12; 


ComplexitT 


FigTjxe  14:  Nximber  of  comparisons  per  task  set 

5  Conclusion  Remarks  and  Future  Work 

In  tins  paper,  we  discuss  the  optimization  techniques  in  real-time  scheduling  for  aperi¬ 
odic  tasks  in  a  uniprocessor  system  with  the  non-preemptive  discipline.  We  first  propose 
a  Sequence-Scheduler  algorithm  to  compute  the  optimal  schedule  for  a  sequence  in  0{n^) 
time.  Then  a  Set-Scheduler  algorithm  is  proposed  based  on  the  super  sequence  and  Sequence- 
Scheduler  algorithm.  The  complexity  of  our  Set-Scheduler  algorithm  is  0{N*n^),  compared 
to  0{N  *  n)  for  the  feasibility  test  by  Erschler  et  ah,  where  N  might  be  as  large  as  expo¬ 
nential  in  the  worst  case.  However,  our  simulation  results  show  that  the  cost  is  reasonable 
for  the  average  case.  We  explore  the  temporal  properties  concerning  the  optimization  issues, 
and  present  several  theorems  to  formalize  the  results.  The  study  of  temporal  properties  on 
a  uniprocessor  may  serve  as  a  base  for  the  more  complex  cases  in  multiprocessor  systems. 

For  the  future  work,  we  propose  to  incorporate  the  decomposition  technique  [18]  into 
our  scheduling  algorithm.  Under  this  approach  a  task  set  can  be  decomposed  into  subsets, 
which  results  m  backtracking  points  to  reduce  the  search  space.  This  has  been  shown  to  be 
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„seM  in  reducing  the  search  space  substantially  when  the  task  set  is  well  decomposable. 
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ABSTRACT 

Let  us  consider  the  problem  of  scheduling  a  set  of  n.  tasks  on  a  single  processor 
such  that,  a  feasible  schedule  which  satisfies  the  time  constraints  of  each  task  is 
generated.  It  is  recognized  that  an  exhaustive  search  may  be  required  to  generate 
such  a  feasible  schedule  or  to  assure  that  there  does  not  exists  one.  In  that  case 
the  computational  complexity  of  the  search  is  of  the  order  n!. 

We  propose  to  generate  the  feasible  schedule  in  two  steps.  In  the  first  step 
we  decompose  the  set  of  tasks  into  m  subsets  by  analyzing  their  ready  times  and 
deadlines.  An  ordering  of  these  subsets  is  also  specified  such  that  in  a  feasible 
schedule  all  tasks  in  an  earlier  subset  in  the  ordering  appears  before  tasks  in  a  laier 
subset.  With  no  simplification  of  scheduling  of  tasks  in  a  subset,  the  scheduling 
complexity  is  where  n,-  is  the  number  of  tasks  in  the  zth  subset. 

The  improvement  of  this  approach  towards  reducing  the  scheduling  complexity 
depends  on  the  the  number  and  the  size  of  subsets  generated.  Experimental  results 
indicates  that  significant  improvement  can  be  expected  in  most  situations. 
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I  Introduction 

Consider  ihe  problem  of  noxipreemptive  scheduling  of  n  tasks  on  a  single  CPU  of  a  hard 
real-time  system.  For  task  T.-,  identified  as  i,  the  scheduling  request  consists  of  a  triple  <  c.-, 
r,.  di  >  where  c,-  is  the  computation  time,  r,  the  ready  time  before  which  task  t  can  not 
sl’art,  and  di  the  deadline  before  which  the  computation  must  be  completed.  Time  interval 
[r-  di]  is  called  the  time  window  denoted  by  wi.  The  window  length  \wi\  is  di  -  r,.  In  a 
hlrd  real-time  system,  a  schedule  is  called  feasible  if  all  tasks  are  processed  within  their 

individual  windows. 

The  result  of  the  scheduling  process  is  a  schedule  in  which  for  any  task  a  start  time 
5,  and  a  finish  time  is  identified,  where  /,-  =  s.  +  c.-  Clearly,  a  schedule  is  feasible,  if  for 

every  task  i, 

f-  <  Si  ^  di  —  c,’. 


The  scheduling  process  is  not  preemptive  only  if  for  any  two  tasks  i  and  j. 


Si  <  Sj  =f-  S{  -1-  C;  <  Sj . 


(2) 


In  other  words,  when  task  i  is  scheduled,  a  span  of  nonpreemptable  processing  time, 
c-  is  allocated  for  it.  No  other  task  may  be  in  execution  during  that  time  span.  Thus  the 
sodding  problem  is  to  find  a  mapping  from  a  task  set  {i)  to  a  start  time  set  {s,},  such 
that  constraints  in  (1)  and  (2)  are  met.  Note  that  for  a  given  set  of  tasks  {:},  there  may 
be  none,  one  or  ma-ny  feasible  schedules. 

b  genera!  the  jonpreemptive  reaJ-time  scheduling  proWen,  is  known  to  be  NP-complete 
IGaie7°9)  To  find  a  feasible  schedule,  the  number  of  schedules  to  be  examined  is  0(n!), 
whia  we  count  as  the  scheduling  cgmplcxit,.  Heuristic  teAniques  can  be  used  |Ma84 
McMaTo,  MokSS.  ZhaoST)  to  reduce  the  complexity.  This  reduction,  however,  is  achieved 
at  the  cost  of  obtaining  a  potentially  sub-optimal  solution.  That  is,  when  looking  for  feasible 
saedules.  heuristic  techniques  may  not  yield  a  feasible  sAedule,  even  though  one  exists 
Schedules  based  on  the  eaibest-deadline-Srst,  or  minimum-laxity-hrst  rules  are  examples  of 
such  heuristics  used  in  scheduling. 

An  alternate  approach  is  to  develop  analytical  methods  for  scheduUng  [ErscSS,  Lm73J. 
This  approach  analyzes  the  relationships  among  real-time  tasks  and  schedules.  The  purpose 
is  to  precisely  determine  optimal  task  schedules,  or  narrow  the  search  scope  from  the  original 

search  space. 

The  objective  of  this  research  is  to  develop  correct  and  efficient  algorithms  for  nonpre- 
emptive  real-time  scheduling.  We  call  a  scheduling  algorithm  correct,  if  whenever  a  feasible 

schedule  exists,  the  algorithm  can  find  it. 
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In  this  paper,  we  present  an  analytical  decomposition  approach  for  real-time  scheduling,. 
The  strategy  is  to  divide  a  set  of  tasks  into  a  sequence  of  subsets,  such  that  the  search  for 
feasible  schedules  is  only  performed  within  each  subset.  The  decomposition  technique  used 
for  generating  the  sequence  of  subsets  assures  that  in  a  feasible  schedule  all  tasks  in  a  subset 
earlier  in  the  sequence  are  scheduled  before  any  task  in  a  later  subset.  Backtracking  in  the 
search  is  bounded  within  each  subset,  which  significantly  reduces  the  scheduling  complexity. 

There  are  several  different  strategies  which  can  be  used  to  subset  tasks.  The  deconnpo- 
sition  strategy  discussed  in  this  paper  is  to  use  a  relation  called  the  leading  relation  which 
depends  on  the  tasks’  relative  window  positions. 

We  performed  an  experiment  which  examined  the  number  and  size  of  subsets  with 
regard  to  the  number  of  tasks,  task  arrival  rate,  and  window  length.  We  found  that,  in 
general,  the  number  of  tasks  in  any  subset  is  independent  of  the  total  number  of  tasks  to 
be  scheduled,  if  the  task  window  lengths  are  bounded.  The  decomposition  scheduling  is  a 
polynomial  computation.  As  a  consequence,  the  decomposition  method  is  very  practical  for 
the  implementation. 

In  section  II  we  present  some  basic  notions  used  in  the  paper.  In  section  III  we  discuss 
a  case  w'here  all  the  tasks  have  the  leading  relation  with  each  other.  Our  approach  of 
decomposition  scheduling  is  introduced  in  section  IV  along  with  concepts  of  the  single 
schedule  subset  and  decomposed  leading  schedule  sequence.  We  present  our  experiment 
results  in  section  V.  Our  conclusion  and  future  research  in  section  VII. 

II  Background 

If  we  consider  any  two  tasks  i  and  j,  they  must  have  one  of  these  three  relations; 

1.  leading  •  i  <  j  (or  j  •<  i),  where  if  r,-  <  Tj,  di  <  dj  and  ^  wj. 

2.  matching  -  ijjj,  if  r,-  =  Tj  and  di  =  dj. 

3.  containing  -  iu  j  (or  j  U  i),.if  r,-  <  ry  and  dj  <  di. 

These  three  relations  are  shown  in  Fig.  1.  It  is  easy  to  see  that  the  leading,  matching 
and  containing  relations  are  all  transitive.  Additionally,  if  tHj  or  i  U  ji,  we  say  that  i  and  j 
are  concurreni. 

A  length  is  associated  with  a  schedule  which  is  the  finish  time  of  the  last  task  in  the 
schedule.  One  example  is  shown  in  Fig.  2. 

The  concept  of  dominance  was  introduced  in  [ErscS3],  and  we  will  use  it  later  in  the 
discussion. 

Definition  1  For  two  schedules  Fj  and  S2,  Si  dominates  if  and  only  if: 

S2  feasible  =i-  Si  feasible. 
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1.  Leading 


2.  Matching  3.  Containing 


time 


Figure  1:  TcLsk  window  relations 

Definition  2  A  set  of  schedules,  S,  is  dominant  M'iS^  ^S,  35]  eS  such  that  5]  commaic? 
52. 

A  schedule  is  dominant  if  it  dominates  all  other  schedules. 

Ill  The  Leading  Schedule  Sequence 

Let  us  consider  the  case  where  for  a  set  of  task  {i},  every  pair  of  tasks  in  this  set  has  a 
leading  relation,  i.e.  i  -<  j  or  j  -<  t,  for  every  j,  i  ^  j. 

Based  on  the  leading  relation  we  can  define  a  total  order  of  tasks  for  the  set.  We  define 
the  leading  schedule  sequence  (LSS)  to  be  a  sequence  cf  in  which  tasks  are  in  order 
according  to  the  leading  relation,  that  is,  for  any  i  and  j,  i  —*  j  <_  i  j,  where  i  j 
means  that  :  is  scheduled  in  front  of  j. 

Theorem  1  For  a  set  of  tasks  all  of  which  have  a  pairwise  leadiir^  ndation,  the  schedule 
where  tasks  are  sequenced  in  order  of  the  leading  scbgdulg  sequence  is  a  dominant  one. 

Proof:  We  prove  this  theorem  by  construction. 
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3 


0  schedule  length 


Figure  2:  An  example  of  one  schedule 

Suppose  this  set  of  tasks  has  a  feasible  schedule  S  in  which  tsisks  occur  in  a  different 
sequence  than  the  leading  schedule  sequence.  When  we  examine  this  schedule,  let  i  and  j 
be  the  first  pair  of  tasks  that  are  not  ordered  by  the  leading  relation,  i.e.  i  ^  j,  but  j  i. 
From  the  leading  relation  we  know  that  r,  <  Vj  and  d,  <  dj. 

Since  j  and  i  are  the  first  such  pair,  deadlines  of  all  tasks  between  j  and  i  in  5  must 
be  greater  than  or  equal  to  dj  as  well  as  d,-.  In  S,  let  us  construct  another  schedule  S'  by 
moving  i  from  the  current  position  in  5  to  the  position  just  in  front  of  j.  The  start  time 
and  finish  time  of  tasks  between  j  and  i  including  j  will  be  increased  by  no  more  than  c,-. 
And  so,  no  task  between  j  and  i  including  j  will  finish  later  than  d,-.  Meanwhile,  the  rest  of 
this  schedule  is  unchanged.  Thus  if  S  is  feasible,  the  new  schedule  S'  will  be  feasible  too. 

By  repeating  the  process  of  constructing  S'  from  5,  we  obtain  a  schedule  which  has  all 
tasks  ordered  according  to  the  leading  relation,  such  that  if  the  original  schedule  is  feasible, 
so  is  the  constructed  one.  □ 

Thus  if  there  exists  a  set  of  feasible  schedules,  the  set  must  contain  schedules  that  are 
conforming  to  the  order  of  the  leading  schedule  sequence.  The  result  can  be  genersdized 
to  the  situation  where  there  exist  matching  windows.  The  generalization  is  to  combine 
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the  tasks  with  the  same  window  into  one  task  whose  computation  time  is  the  sum  of  the 
computation  times  of  all  these  tasks. 

We  will  see  that  the  above  leading  schedule  sequence  is  a  special  case  of  the  decomposed 
leading  schedule  sequence  introduced  in  the  next  section. 

IV  Task  Decomposition 

A.  Philosophy 

To  solve  the  general  real-time  scheduling  problem  with  n  tasks,  the  number  of  schedules  to 
be  examined  can  be  as  much  as  0(n!).  However,  taking  a  closer  look,  we  find  that  every 
task  has  an  important  property  called  the  locality  of  a  taisk,  that  is,  a  task  is  time-bounded 
by  its  time  window.  Furthermore,  if  any  two  task  windows  are  not  overlapping,  there  is 
only  one  possible  order  for  them.  The  above  facts  motivate  us  to  separate  the  tasks  into 
subsets  according  to  their  different  time  localities. 

The  decomposition  scheduling  can  be  divided  into  two  steps:  decomposition  and  schedul¬ 
ing. 

First,  a  set  of  n  tasks  is  decomposed  into  a  sequence  of  m  subsets  such  tiiat  the  orders  of 
subsets  are  fixed.  The  order  of  a  task  is  determined  only  relative  to  the  other  tasks  within 
its  own  subset.  The  sequence  of  the  subsets  is  called  the  decomposed  schedule  sequence. 
The  decomposition  should  be  so  developed  that  the  schedulability  of  tasks  is  not  damaged 
at  all.  The  decomposition  by  using  the  leading  relation  introduced  in  this  paper  shows  this 
property. 

The  second  step  is  to  schedule  the  subsets  in  the  sequence  order.  It  always  selects 
a  schedule  for  each  subset  with  the  shortest  length,  so  that  when  a  subset  is  scheduled, 
the  time  span  available  for  it  is  maximized.  In  w^y>  the  total  number  of  schedules 
to  be  examined  is  only  n,!),  where  nc  js  tbe  number  of  tasks  in  the  iih  subset 

(Er=in,  =  n). 

The  only  remaining  problem  is  how  to  decompose  a  set  of  tasks  into  a  sequence  of 
subsets  of  tasks  such  that  a  feasible  schedule  is  guaranteed  to  be  found  if  one  exists.  In  the 
rest  of  this  paper,  we  outline  how  to  use  the  leading  relation  as  a  means  to  divide  the  task 
set. 

B.  Decomposition  Scheme 

A  set  of  tasks  is  called  the  single  schedule.sxiksel{sss)^  represented  as  r,  if 

Vi  £  T  3j  e  r  (i  U  j)  V  {j  LI  i)  V  (i||y). 
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Ill  other  words,  each  task  window  is  contained  in  the  window  of  anoliier  leLsk,  contains 
the  window  of  another  tcLsk,  or  matches  the  window  of  another  task  in  the  subset. 

Given  a  set  of  tasks  {t},  we  can  decompose  it  into  a  sequence  of  single  schedule  subsets 
such  that  all  the  tasks  in  r'  are  leading  to  all  the  tasks  in 

The  decomposed  leading  schedule  sequence  {DLSS)  is  defined  to  be  a  sequence  of  single 
schedule  subsets,  denoted  as: 

DiSS  =  T^or^o.--oT’”, 

such  that  V/;’  £  r'  V/c-^  €  k'  -<  ,  for  I  <  i  <  j  <  m,  (denoted  as  r’  -<  r-’),  and  r’  can  not 

be  further  decomposed,  for  i  =  1,  •  •  ■ ,  m.  Symbol  o  represents  a  concatenating  operation. 

Note  that  if  a  task  in  t’  does  not  lead  another  task  in  for  i  <  j,  they  must  have 
a  matching  or  containing  relation.  K  this  happens,  r’  and  can  not  be  different  single 
schedule  subsets.  Clearly,  all  n  tasks  may  belong  to  a  single  schedule  subset. 

Theorem  2  The  set  of  schedules  conforming  to  the  decomposed  leading  schedule  sequence 
is  dominant. 

Proof:  Assume  that  if  there  are  two  tasks  k'  6  r’  and  k^  £  r^,  where  r*  -<  W.  There  is  no 
common  concurrent  task  with  both  k'  and  kK  k^  is  positioned  in  front  of  /:'  in  a  feasible 
schedule  (5).  Specifically,  5  =(••■)  o  o  •  •  •  k')  o  (•  •  •).  Let  us  define  S"  =  {k^  o  •••/:’)  for 
abbreviation  (S  =  (••■}  c  S' o  (■  ■  •)).  The  new  schedule  created  by  exchanging  k'-'s  position 
with  k-^'s  is  still  feasible. 

Without  loss  of  generality,  suppose  that  k'  and  k^  are  the  first  such  pair  in  S.  Tasks 
between  k^  and  k'  are  led  by  k^,  or  concurrent  with  but  not  leading  to  and  not  concurrent 
with  k\  Since  k'  ■<  k^  (i.e.  r,-  <  rj),  switching  k*‘s  and  k^'s  positions  will  not  increase  the 
finish  time  of  S',  which  is  defined  as  the  finish  time  of  the  last  task  in  S'.  All  the  tasks 
between  k’  and  k^,  including  k^  ,  are  led  by  k’,  i.e.  having  deadlines  greater  than  or  equal 
to  d/.i.  K  5  is  feasible  with  k’  as  the  last  task  in  S',  it  will  be  still  feasible  after  the 
switching. 


Note  that  if  the  set  of  schedules  that  are  following  the  decomposed  leading  schedule 
sequence  is  empty,  there  is  no  feasible  schedule  available  for  the  tasks  to  be  scheduled. 

C.  Decomposition  Algorithm 

Decomposing  a  set  of  tasks  into  single  schedule  subsets,  the  algorithm  starts  with  the  tasks 
having  been  sorted  by  their  ready  times  (using  their  deadlines  if  their  ready  times  are  the 
same). 
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The  algorithm  uses  one  single  loop  to  determine  which  single  schedule  subset  the  cur¬ 
rent  task  should  belong  to.  The  loop  consists  of  two  parts.  The  first  part  is  a  while  loop 
which  merges  single  schedule  subsets  into  one,  if  the  current  task  is  contained  by  them.  The 
second  part  decides  whether  the  current  task  can  form  a  new  single  schedule  subset,  or  join 
with  another  single  schedule  subset. 


The  Leading-relation  Decomposition  Algorithm 
begin 

/*  Initialization.  */ 

k  =  1;  =  {1); 

=  r-i]  =  di\ 

for  :  =  2  to  n  do  /*  Go  over  the  task  list.  */ 

/=/:  —  !;  /*  /  is  the  index  of  single  schedule  subsets.  * / 

continue  =  TRUE; 

while  (/  >  0)  A  (continue)  do 

/*  Merge  single  schedule  subsets  if  the  current  task  is  concurrent 
with  tasks  in  different  subsets.  */ 
if  (d^i  >  di) 

=  T*  U 
d_i  =  d^k] 

k  =  1; 
else 

continue  =  FALSE; 

1  =  1-1-, 
od 

if  (r^k  <  r,)  A  (di  <  d^k) 

/*  The  current  task  is  concurrent  with  tasks  in  the  current  subset.*/ 
r^  =  r*u{:}; 

else  if  <  r,)  A  (d^k  <  di) 

I*  The  current  task  is  led  by  all  the  tasks  in  the  current  subset. 

A  new  single  schedule  subset  is  created  only  containing  the  current  task.*/ 
k  =  k->rl-, 
r*  =  {:}; 

T^k  =  Tii 

d^k  =  di. 
od 
end 
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In  this  pJgorithm.  the  outer  loop  is  executed  r,  times.  The  while  loop  is  executed  no 
more  than  the  number  of  time  proportional  to  n  in  total,  since  no  more  than  n  subsets  can 
be  merged  during  the  whole  execution  of  the  algorithm  with  n  tasks.  Thus  the  complexity 
of  this  algorithm  is  only  0(n).  If  we  count  in  the  sorting  complexity,  the  decomposition 
will  cost  no  more  than  Ofniogn). 


D.  Scheduling  Scheme 

After  tasks  has  been  decomposed  into  a  sequence  of  subsets,  scheduling  should  be  performed 
on  each  subset  in  the  sequence  order,  such  that  the  schedule  on  each  subset  is  of  the 
shortest  length.  A  brute  force  method  is  to  give  an  exhaustive  search  whose  computational 
complexity  amounts  to  0(n,!),  where  n,-  is  the  number  of  tasks  in  the  ith  subset. 

In  [Yuan89b],  other  scheme  is  explored  for  scheduling  a  subset.  The  method  is  to  first 
build  a  super-sequence  where  tasks  may  have  .several  occurrences.  The  occurrence  of  a  task 
is  decided  by  its  relative  window  position  in  the  subset.  Selecting  one  occurrence  for  every 

—  (n'— ) 

task  in  the  super- sequence  forms  a  schedule.  A  complete  search  costs  0(n?  ‘  )  in  the 

t(n,  — ) 

worst  case.  When  we  made  a  few  calculation  samples  of  n?  ’  with  n;  less  than  100, 

jg  3^  smaller  number  than  n,-!,  as  shown  in  the  cited  paper. 

Since  the  set  of  schedules  following  the  decomposed  leading  schedule  sequence  is  domi¬ 
nant,  and  since  the  subsets  are  scheduled  in  the  sequence  order  with  their  shortest  length,  it 
is  proved  that  the  decomposition  scheduling  with  the  leading  relation  is  correct  [Yuan89b]. 


V  Empirical  Study 

A.  Experiment 

In  order  to  observe  the  behavior  of  the  number  of  tasks  in  a  single  schedule  subset  and 
number  of  the  subsets  to  be  created  with  regard  to  the  number  of  tasks  to  be  scheduled, 
task  arrival  rate,  and  task  window  length,  we  conduct  an  experiment  as  an  example  to  see 
the  feasibility  of  our  approach  for  practical  implementation. 

The  outputs  we  are  interested  in  are: 

1.  the  number  of  single  schedule  subsets  (sss), 

2.  the  number  of  window  concurrences, 

3.  the  maximum  number  of  tasks  in  single  schedule  subsets, 

4.  the  minimum  number  of  tasks  in  single  schedule  subsets,  and 
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5.  the  average  number  of  tasks  in  single  schedule  subsets. 

One  window  concurrence  is  counted  for  any  two  tasks  i  and  j  )f  ^  and  j  have 
relation.  We  call  the  number  of  tasks  in  a  single  schedule  subset  as  the  sr.e  or  t.  ,  .rbset 
Meanwhile,  we  change  the  following  parameters  independently  to  watch  the  cn..ge  . 

the  outputs, 

1.  the  number  of  total  tasks, 

2.  task  arrival  rate,  and 

3.  window  length. 

The  d«e  is  shown  in  Table  1-4'  in  the  end  of  this  paper.  Following  are  basic  rules  in 
the  experiment. 

1.  The  computation  time  is  uniformly  distributed  over  (0,a]. 

2.  The  task  interaxrival  is  uniformly  distributed  over  [0,^).  The  arrival  rate  is  2/^. 

3  The  window  length  is  also  randomly  created  by  controlling  the  laouty  for  each  tok. 
?he  laotitv  of  a  task  is  the  difference  between  its  window  length  and  its  computat.on 
time.  The  lartity  is  uniformly  distributed  (0,  i).  The  distribut.on  guarantees 
window  length  greater  than  the  computation  time  for  the  task. 

We  notice  that  the  arrival  rate  should  be  less  than  or  equal  to  the  service  rate,  otherwise 
there  are  congestions  in  the  system,  which  will  result  in  deadline-missing.  In  other  words, 

2//?  <  2/ja.  That  is, 

a  <  P- 

The  random  numbers  are  provided  by  function  dra,^)  in  the  UNIX  operating  system. 

The  numbers  are  uniformly  distributed  over  [0,  1)  [StevSS].  •  i  ,, 

In  the  experiment,  we  found  that  the  minimum  size  of  single  schedule  subsets  ,s  always 


B.  The  Result  Explanation  and  Observation 

From  the  experiment  resulU  ,  we  found  that  when  the  average  window  length  increase 
h  increases),  the  number  of  single  schedule  subsets  rulutes  and  the  maxrmum  srre  of 
single  schedule  subsets  slightly  increases.  The  result  is  expected,  siuce  the  larger  some  .ask 
■  »n>.  b,  W.  C.n„rr„«,  b,  concur,.  X.cr.n.  by 

avg.  The  Single  schedxHe  subset  by  sss. 
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windows  a.ie,  Ihe  more  tasks  may  be  concurrent  with  tliem.  Titese  tasks  may  be  in  the  same 
single  schedule  subset. 

When  P  increases,  that  is,  the  arrival  rate  decreases,  the  number  of  single  schedule 
subsets  increases,  and  maximum  size  of  single  schedule  subsets  decreases.  Ihe  result  is  also 
expected,  since  when  the  arrival  rate  decreases,  the  opportunity  of  tasks  concurrent  with 
each  other  decreases  too.  Most  tasks  have  the  leading  relation  with  each  other. 


Figure  3:  The  relationship  between  the  size  of  single  schedule  subsets  and  the  number  of 
tasks  with  regard  to  the  laxity  parameter  7,  where  a  =  4,  /5  =  4. 

Fig.  3  shows  the  relationship  between  the  maximum  size  of  single  schedule  subsets  and 
the  number  of  tasks  to  be  scheduled.  From  the  experiment,  we  found  that  the  size  of  a 
single  schedule  set  never  exceeds  14  even  when  there  are  300  tasks  being  scheduled.  The 
observation  indicates  that  for  most  cases  is  a  constant. 

We  show  the  relationship  between  the  number  of  subsets  (m)  and  number  of  tasks  to 
be  scheduled  in  Fig.  4  and  Fig.  5  with  regard  to  different  window  length  and  arrival  rate 
distributions. 
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Figure  4:  The  relationship  between  the  number  of  single  schedule  subsets  and  the  number 
of  tasks  with  regard  to  the  laxity  parameter  7,  where  a  =  4,  /?  =  4. 

VI  Final  Remarks 

In  this  paper,  we  examine  the  problem  of  nonpreemptive  scheduling  of  n  tasks  on  a  single 
CPU  in  hard  resd-time  systems.  We  propose  a  correct  decomposition  strategy  for  the 
scheduling.  The  strategj’  significantly  reduces  the  scheduling  complexity  for  roost  Oises. 

In  this  paper  we  have  examined  a  decomposition  technique  based  only  on  the  windows 
of  tasks.  By  taking  into  account  the  computation  time  requirements,  the  decomposition 
can  be  made  stronger  [YuanSSa].  TK«  decomposition  approach  may  also  be  extended  to 
consider  precedence  and  other  dependences  among  tasks.  This  aspect  of  decomposition 
technique  needs  further  study. 
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Figure  5:  The  relationship  between  the  number  of  single  schedule  subsets  and  the  number 
of  tasks  with  regard  to  the  arrival  rate  parameter  p,  where  o  =  4,  7  =  2. 


experiment. 
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Table  1:  a  =  4,  /?  =  4 

num  of  j  -) 

avg  W. 
length  1 

num  of 

concurr. 

nuni  of 

5.^5 

sss  size 

tasks 

1 

max 

avg 

2 

3.02 

5 

45 

2 

1.11 

50 

4 

4.33 

IS 

35 

3 

1.43 

6 

5.08 

27 

25 

5 

2.00 

8 

6.53 

14 

36 

5 

1,39 

10 

7.03 

26 

30 

6 

1.67 

100 

3.11 

IS 

82 

3 

1.22 

4 

4.18  ■ 

28 

74 

5 

i.35 

6 

37 

68 

6 

1.47 

8 

5.84  • 

56 

55 

6 

1.82 

10 

7.01 

77 

41 

14 

2.44 

150 

32 

118 

4 

1.27 

4 

3.78 

40 

114 

4 

1.32 

6 

4.93 

68 

92 

6 

1.63 

8 

5.85 

91 

73 

7 

2.05 

10 

6.56 

111 

68 

10 

2.21 

200 

2 

3.11 

41 

159 

4 

1.26 

4 

3.67 

64 

144 

4 

1.39 

6 

4.94 

98 

116 

6 

1.72 

8 

5.94 

102 

113  1 

8 

1.77 

10 

7.27 

161 

81 

8 

2.47 

250 

2 

3.02 

46 

204 

.4 

1.23 

4 

3.82 

93 

j  167 

5 

1.50 

e'" 

4.83 

107 

i  160 

6 

1.56 

8 

6.33 

162 

121 

8 

2.07 

WUM 

186 

103 

1  n 

2.43 

300 

2 

1  2.98 

66 

1  237 

4 

1.27 

4 

j  3.90 

105 

1  205 

5 

1.46 

6 

5.09 

125 

i  194 

6 

j  1.55 

8 

j  6.21 

179 

j  153 

11 

1.96 

10 

6.92 

228 

1  125 

11 

2.40 
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Table  4 

0  =  4,^ 

=  10 

num  of 

7 

avg  W .  1 

num  of  1 

num  of 

SSS  size 

tasks 

length 

concurr. 

sss 

max 

avg 

2 

3.44 

2 

48 

2 

1.04 

4 

4.11 

5 

45 

3 

1.11 

50 

6 

5.50 

4 

46 

3 

1.09 

8 

6.69 

11 

41 

4 

1.22 

10 

6.61 

15 

38 

5 

1.32 

2 

3:02 

«  - 

89 

4 

1.12 

4 

4.01 

14 

87 

4 

1.15 

100 

6 

4.95 

21 

80 

4 

1.25 

8 

6.59 

18 

82 

3 

1.22 

10 

6.68 

25 

76 

5 

1.32 

2 

3.00 

6 

144 

2 

1.04 

4 

3.98 

11 

139 

3 

150 

6 

4.72 

11 

139 

2 

n 

8 

5.91 

37 

117 

5 

1.28 

43 

109  ! 

6 

1.38 

2 

3.00 

18 

182 

3 

1.10 

4 

3.97 

27 

174 

5 

& 

200 

T" 

5.19 

23 

177 

3 

loa 

IBB! 

34 

167 

4 

1.20 

6.90 

52 

153 

4 

1.31 

2 

2.94 

24 

228 

3 

1.10 

4 

4.12 

32 

218 

1  3 

1  1.15 

250 

T" 

j  5.10 

43 

j  213 

1  4 

1.17  j 

8 

1  6.0" 

45 

1  206 

1  3 

1  1.21 

10 

1  6.96 

51 

1  200 

4 

1  1.25 

3.05 

27 

274 

1  3 

I  1.09 

4 

4.07 

21 

280 

3 

j  1.07 

300 

T" 

5.04 

54 

251 

6 

1.20 

8 

6.16 

46 

254 

3 

1.18 

"IF 

6.61 

61 

243 

8 

1.23 
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•tCUHITY  Cl./>A4IFICATIOI<  or  THIS  PAOt 


Let  US  consider  the  problem  of  scheduling  a  set  of  n  tasks  on  a  single  processor 
such  that  a  feasible  schedule  which  satisfies  the  time  constraints  of  each  task  is 
generated.  It  is  recognized  that  an  exhaustive  search  may  be  required  to  generate 
such  a  feasible  schedule  or  to  assure  that  there  does  not  exists  one.  In  that  case 
the  computational  complexity  of  the  search  is  of  the  order  n!. 

We  propose  to  generate  the  feasible  schedule  in  two  steps.  In  the  first  step 
we  decompose  the  set  of  tasks  into  m  subsets  by  analyzing  their  ready  times  and 
deadlines.  An  ordering  of  these  subsets  is  also  specified  such  that  in  a  feasible 
schedule  all  tasks  in  an  earlier  subset  in  the  ordering  appears  before  tasks  in  a  later 
subset.  With  no  simplification  of  scheduling  of  tasks  in  a  subset,  the  scheduling 
complexity  is  n.!),  where  n.-  is  the  number  of  tasks  in  the  ith  subset. 

The  improvement  of  this  approach  towards  reducing  the  scheduling  complexity 
depends  on  the  the  number  and  the  size  of  subsets  generated.  Experimental  results 
indicates  that  significant  improvement  can  be  expected  in  most  situations. 
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Abstract 

A  simple  approach  to  inter-domain  routing  is  domain-level  source  routing  with  link-state 
approach  where  each  node  maintains  a  domain-level  view  of  the  internetwork.  This  does  not  scale 
up  to  large  internetworks.  The  usual  scaling  technique  of  aggregating  domains  into  superdomains 
loses  ToS  and  policy  detail. 

We  present  a  new  viewserver  hierarchy  and  associated  protocols  that  (1)  satisfies  policy 
and  ToS  constraints,  (2)  adapts  to  dynamic  topology  changes  including  failures  that  partition 
domains,  and  (3)  scales  well  to  large  number  of  domains  without  losing  detail.  Domain-level 
views  axe  maintained  by  special  nodes  called  viewservers.  Each  viewserver  maintains  a  domain- 
level  view  of  a  surrounding  precinct.  Viewservers  are  organized  hierarchically.  To  obtain  domain- 
level  source  routes,  the  views  of  one  or  more  viewservers  are  merged  (upto  a  maximum  of  twice 
the  levels  in  the  hierarchy). 

We  also  present  a  model  for  evaluating  inter-domain  routing  protocols,  and  apply  this  model 
to  compare  oui  viewserver  hierarchy  against  the  simple  approach.  Our  results  indicate  that  the 
viewserver  hierarchy  finds  many  short  valid  paths  and  reduces  the  amount  of  memory  require¬ 
ment  by  two  orders  of  magnitude. 


Categories  and  Subject  Descriptors:  C.2.1  [Computer-Communication  Networks]:  Network  Archi¬ 
tecture  and  Design — packti  networks;  store  and  forward  networks;  C.2.2  [Computer- Commxini cation  Net^ 
works]:  Network  Protocols— proloco/  architecture;  C.2.m  [Routing  Protocols];  F.2.m  [Computer  Network 
Routing  Protocols]. 
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views,  opinions,  and/oi  findings  contained  in  this  report  are  those  of  the  author(s)  and  should  not  be  interpreted  as 
representing  the  ofiicial  policies,  cither  expressed  or  implied,  of  the  Advanced  Research  Projects  Agency,  PL,  or  the 
U.S.  Government. 

"^The  author  is  also  supported  by  University  of  Maryland  Graduate  School  Fellowship  and  Washington  DC  Chapter 
of  the  ACM  Samuel  Alexander  Fellowship. 
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1  Introduction 


A  computer  internetwork,  such  as  the  Internet,  is  an  interconnection  of  backbone  networks,  regional 
networks,  metropolitan  area  networks,  and  stub  networks  (campus  networks,  office  networks  and 
other  small  networks)^.  Stub  networks  are  the  producers  and  consumers  of  the  internetwork  traffic, 
while  backbones,  regionals,  and  MANs  are  transit  netw’oxks.  (Most  of  the  networks  in  an  internet¬ 
work  axe  stub  networks.)  Each  network  consists  of  nodes  (hosts,  routers)  and  links.  Two  networks 
are  neighbors  when  there  is  one  or  more  links  between  nodes  in  the  two  networks  (see  Figure  1). 


Figure  1:  A  portion  of  an  internetwork.  (Circles  represent  stub  networks.) 

An  internetwork  is  organized  into  domain^.  A  domain  is  a  set  of  networks  (possibly  consisting  of 
only  one  network)  administered  by  the  same  agency.  Within  each  domain,  an  intra-domain  routing 
protocol  is  executed  that  provides  routes  between  source  and  destination  nodes  in  the  domain.  This 
protocol  can  be  any  of  the  typical  ones,  i.e.,  next-hop  or  source  routes  computed  using  distance- 
vector  or  link-state  a^oiithms. 

Across  all  domains,  an  inter-domain  routing  protocol  is  executed  that  p^o^'ides  routes  be¬ 
tween  source  and  destination  nodes  in  different  domains.  This  protocol  must  satisfy  various  con¬ 
straints: 

(1)  It  must  satisfy  policy  constraints,  which  are  administrative  restrictions  on  the  inter-domain 
traffic  [8,  12,  9,  5].  Policy  constraints  are  of  two  types:  transit  policies  and  source  policies. 
The  transit  policies  of  a  domain  A  specify  how  other  domains  can  use  the  resources  of  A 
(e.g.  SO.Ol  per  packet,  no  traffic  from  domain  JB).  The  source  policies  of  a  domain  A  spedfy 

’  For  example,  NSFKET,  MILNET  are  backbones  and  Snranet,  CerlNet  are  regionals. 

*  also  referred  to  as  routing  domains 
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coasiraiats  on  traffic  originating  from  A  (e.g.  domains  to  avoid/prefer,  acceptable  connection 
cost).  Transit  policies  of  a  domain  are  public  (i.e.  available  to  other  domains),  ■whereas  source 
policies  are  usually  private. 

(2)  An  inter-domain  routing  protocol  must  also  satisfy  iype-of-service  (ToS)  constraints  of  ap¬ 
plications  (e.g.  low  delay,  high  throughput,  high  reliability,  TninimTiTn  monetary  cost).  To  do 
this,  it  must  keep  track  of  the  types  of  services  offered  by  each  domain  [5]. 

(3)  Inter-domain  routing  protocols  must  scale  up  to  very  large  internetworks,  i.e.  with  a  very  large 
number  of  domains.  Practically  this  means  that  processing,  memory  and  communication 
requirements  should  be  much  less  thetn  linear  in  the  n'umber  of  domains. 

(4)  Inter-domain  routing  protocols  m'ust  automatically  adapt  to  liuV  cost  changes,  node/link 
failures  and  repairs  including  failures  that  partition  domains  [15].  It  must  also  handle  non- 
hierarchical  domain  interconnections  at  any  level  [9]  (e.g.  we  do  not  want  to  hand-conffgure 
special  routes  as  “back-doors”). 

A  simple  (or  straightfor'ward)  approach  to  inter-domain  routing  is  domain-level  source  routing 
•with  link-state  approach  {8,  5].  In.  this  approach,  each  router^  maintains  a  domain-level  view  of  the 
internetwork,  i.e.,  a  graph  with  a  vertex  for  every  domain  and  an  edge  between  every  two  neighbor 
domains.  Policy  and  ToS  information  is  attached  to  the  vertices  and  the  edges  of  the  •view. 

When  a  so-urce  node  needs  to  reach  a  destination  node,  it  (or  a  router^  in  the  source’s  domain) 
first  examines  this  •view  and  determines  a  domain-level  source  route  satisfying  ToS  and  policy 
constraints,  i.e.,  a  sequence  of  domain  ids  starting  from  the  source’s  domain  and  ending  •with  the 
destination's  domain.  Then,  the  packets  are  routed  to  the  destination  using  this  domain-level 
source  route  and  the  intra-domain  routing  protocols  of  the  domains  crossed. 

The  disadvantage  of  this  simple  scheme  is  that  it  does  not  scale  up  for  large  internetworks.  The 
storage  at  each  router  is  proportional  to  Wjp  x  jEjj,  where  Nd  is  the  n'umber  of  domains  and  Ed 
is  the  average  n'umber  of  neighbor  domains  to  a  domain.  The  comm'unication  cost  is  proportional 
to  Nr  X  where  Nr  is  the  number  of  routers  in  the  internetwork  and  Er  is  the  average  router 
neighbors  of  a  router  (topology  changes  are  flooded  to  all  routers  in  the  internetwork). 

To  achieve  scaling,  several  approaches  based  on  aggregating  domains  into  superdomains  have 

*  Not  ail  nodes  maintzon  routing  tabies.  A  router  is  a  node  that  maintains  a  routing  table. 

■*  referred  to  as  the  policy  server  in  [8] 
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been  proposed  [13,  16,  6].  This  approacbes  have  drawbacks  because  the  aggregation  results  in  loss 
of  detail  (discussed  in  Section  2). 

Our  protocol 

In  this  paper,  we  present  an  inter-domain  routing  protocol  that  we  have  proposed  recently[3].  It 
combines  domain-level  views  with  anovd  hierarchical  scheme.  It  scales  well  to  large  internetworks, 
and  does  not  suffer  from  the  problems  of  superdomains. 

In  our  scheme,  domain-level  views  are  not  maintained  by  every  router  but  by  special  nodes 
called  viewseroers.  For  each  viewserver,  there  is  a  subset  of  domains  around  it,  referred  to  as  the 
viewserver’s  prtcinct.  The  viewserver  maintains  the  domain-level  view  of  its  precinct.  This  solves 
the  scaling  problem  for  storage  requirement. 

A  viewserver  can  provide  domain-level  source  routes  between  source  and  destination  nodes  in 
its  precinct.  Obtaining  a  domain-level  source  route  between  a  source  and  a  destination  that  are 
not  in  any  single  view,  involves  accumulating  the  views  of  a  sequence  of  viewservers.  To  make  this 
process  efficient,  viewservers  are  organized  hierarchically  in  levels,  and  an  associated  addressing 
structure  is  used.  Fach  node  has  a  set  of  addresses.  Fach  cddnss  is  a  sequence  of  viewserver  ids  of 
decreasing  levels,  starting  at  the  top  level  and  going  towards  the  node.  The  idea  is  that  when  the 
views  of  the  viewservers  in  an  address  are  merged,  the  merged  view  contains  domain-level  routes 
to  the  node  from  the  top  level  viewservers.  (Addresses  are  obtained  from  name  servers  in  the  same 
way  as  is  currently  done  in  the  Internet.) 

We  handle  dynamic  topology  changes  such  as  node/link  failures  and  repairs,  link  cost  changes, 
and  domain  partitions.  Gateways®  detect  domain-level  topology  changes  affecting  its  domain  and 
neighbor  domains.  For  each  domain,  there  is  a  reporting  gateway  that  communicates  these  changes 
by  flooding  to  the  viewservers  in  a  specified  subset  of  domains;  this  subset  is  referred  to  as  its  flood 
area.  Hence,  the  number  of  packets  used  during  flooding  is  proportional  to  the  size  of  the  flood 
area.  This  solves  the  scaling  problem  for  the  communication  requirement. 

Thus  our  inter-domain  routing  protocol  consists  of  two  subprotocols:  a  view-query  proto¬ 
col  between  routers  and  viewservers  for  obtaining  merged  views;  and  a  view- update  protocol 
between  gateways  and  viewservers  for  updating  domain-level  views. 

^  A  node  is  called  a  gateway  if  it  has  a  link  to  another  domain. 
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Evaluation 


Many  inter-domain  routing  protocols  have  been  proposed,  based  on  various  Ends  of  hierarchies. 
How  do  these  protocols  compare  against  each  other  and  against  the  simple  approach?  To  answer  this 
question,  we  need  a  model  in  which  we  can  define  internetwork  topologies,  policy/ToS  constraints, 
inter-doniain  routing  hierau’chies,  and  evaluation  measures  (e.g.  memory  and  time  requirements) 
for  inter-domain  routing  protocols.  None  of  these  protocols  have  been  evaluated  in  a  way  that  they 
can  be  compared  against  each  other  or  the  simple  approach. 

In  this  paper,  we  present  such  a  model,  and  use  it  to  compare  our  viewserver  hierarchy  to  the 
simple  approach.  Our  evaluation  measures  are  the  amount  of  memory  required  at  the  source  and 
at  the  routers,  the  amount  of  time  needed  to  construct  a  path,  and  the  number  of  valid  paths 
found  (and  their  lengths)  in  comparison  to  the  number  of  available  valid  paths  (and  their  lengths) 
in  the  internetwork.  We  use  three  internetwork  topologies  each  of  size  11,110  domains  (roughly  the 
current  size  of  the  Internet).  Our  results  indicate  that  the  viewserver  hierarchy  finds  many  short 
valid  paths  and  reduces  the  amount  of  memory  requirement  by  two  orders  of  magnitude. 

Organization  of  the  paper 

In  Section  2,  we  survey  recent  approaches  to  inter-domain  routing.  In  Section  3,  we  present  the 
view-query  protocol  for  static  network  conditions,  that  is,  assuming  all  links  and  nodes  of  the 
network  remain  operational.  In  Section  4,  we  present  the  view-update  protocol  to  handle  topology 
changes  (this  section  is  not  needed  for  the  evaluation  part).  In  Section  5,  we  present  our  evaluation 
model  and  results  from  its  application  to  the  viewserver  hierarchy.  In  Section  6,  we  conclude  and 
describe  how  to  add  fault-tolerance  and  cacheing  schemes  to  improve  performance. 

2  Related  Work 

In  this  section,  we  survey  recently  proposed  inter-domain  routing  protocols  that  support  ToS  and 
Policy  routing  for  large  internetworks  [14,  16,  13,  10,  6,  20,  2, 19, 18,  7]. 

Several  inter-domain  routing  protocols  (e.g.  BGP  [14],  IDIIP  [16],  NE  [10])  are  based  on  path- 
vector  approach  [17].  Here,  for  each  destination  domain  a  router  maintains  a  set  of  paths,  one 
through  each  of  its  ndghbor  routers.  TbS  and  policy  information  is  attached  to  these  paths.  Each 
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router  requires  0{Nd  x  Nr,  x  Er)  space.  For  each  destination,  a  router  exchanges  its  best  valid 
path®  with  its  neighbor  routers.  However,  a  path-vector  algorithm  may  not  iind  a  valid  path 
from  a  source  to  the  destination  even  if  such  a  route  exists  [13]'.  By  exchanging  k  paths  to  each 
destination,  the  probabiJity  of  detecting  a  valid  path  for  each  source  can  be  increased. 

The  most  common  approach  to  solve  the  scaling  problem  is  to  use  superdomain^  (e.g.  IDPR  [13], 
IDEP  [16],  Nimrod  [6]).  Superdomains  extend  the  idea  of  area  hierarchy  [11].  Here,  domains  are 
grouped  hierarchically  into  superdomains:  “close”  domains  are  grouped  into  level  1  superdomains, 
“close”  level  1  superdomains  are  grouped  into  level  2  superdomains,  and  so  on.  Each  domain 
A  is  addressed  by  concatenating  the  superdomain  ids  starting  from  a  top  level  superdomain  and 
going  down  towards  A.  A  router  maintains  a  view  that  contains  the  domains  in  the  same  level  1 
superdomain,  the  level  1  superdomains  in  the  same  level  2  superdomain,  and  so  on.  Thus  a  router 
maintains  a  smaller  view-  than  it  would  in  the  absence  of  hierarchy.  Each  superdomain  has  its  own 
ToS  and  policy  constraints  derived  from  that  of  the  subdomains. 

There  are  several  major  problems  with  using  superdomains.  One  problem  is  that  if  there  are 
domains  with  different  (possibly  contradictory)  constraints  in  a  superdomain,  then  there  is  no  good 
way  of  deriving  the  ToS  and  policy  constraints  of  the  superdomain.  The  usual  techniques  are  to 
taie  either  the  union  or  the  intersection  of  the  constraints  of  the  subdomains  [13].  Both  techniques 
have  problems®.  Other  problems  are  described  in  [6,  2].  Some  of  the  problems  can  be  relaxed  by 
having  overlapping  superdomains,  but  this  increases  the  storage  requirements  drastically. 

Nimrod  [6]  and  IDPR  [13]  use  the  link-state  approach,  domain-level  source  routing,  and  super- 
domains  (non-overlapping  superdomains  for  Nimrod).  EDRP  [16]  uses  path-vector  approach  and 
superdomains. 

Reference  [10]  combines  the  benefits  of  path-vector  approach  and  link-state  approach  by  having 
two  modes:  An  NR  mode,  which  is  an  extension  of  IDRP  and  is  used  for  the  most  common  ToS 
and  policy  constraints;  and  a  SDR  mode,  which  is  like  IDPR  and  is  used  for  less  frequent  ToS  and 

6  A  valid  path  is  a  path  that  satisfies  the  ToS  and  ]>elicy  constraints  of  the  domains  in  tie  path. 

’  For  example,  suppose  a  router  tx  has  two  paths  Fl  and  J>2  to  the  destination.  Let  «  have  a  ronter  neighbor  «,, 
which  is  in  another  domain.  «  chooses  and  informs  v  of  one  of  the  paths,  say  Pi.  But  Pi  may  violate  source  pobaes 
of  u’s  domain,  and  P2  may  be  a.  Tabd  palb  lox 

®  also  referred  to  as  routing  domain  confederatians 

®  For  example,  if  the  union  is  taken,  then  a  subdomain  ^  can  be  forced  to  obey  constraints  of  other  subdomains; 
this  may  eliminate  a  path  through  A  which  is  otherwise  vabd.  H  the  intersection  is  taken,  then  a  subdomain  A  can 
be  forced  to  accept  traffic  it  would  otherwise  not  accept. 
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policy  requests.  This  study  does  not  address  the  scalability  of  the  SDR  mode. 

In  [2],  we  proposed  another  protocol  based  on  superdomains.  It  always  finds  a  valid  path  if 
one  exists.  Both  union  and  intersection  policy  and  ToS  constraints  are  maintained  for  each  visible 
superdomain.  If  the  union  policy  constraints  of  superdomains  on  a  path  axe  satisfied,  then  the  path 
is  valid.  If  the  intersection  policy  constraints  of  a  superdomain  axe  satisfied  but  the  union  policy 
constraints  axe  not,  the  source  uses  a  query  protocol  to  obtain  a  more  detailed  “internal”  view  of 
the  superdomain,  and  searches  again  for  a  valid  path.  The  protocol  uses  a  link-state  view  update 
protocol  to  handle  topology  changes,  including  failures  that  partition  superdomains  at  any  level. 

The  landmark  hierarchy  [19,  18]  is  another  approach  for  solving  the  scaling  problem.  Here, 
each  router  is  a  landmark  with  a  radius,  and  routers  which  axe  within  a  radius  away  from  the 
landmark  maintain  a  route  to  it.  Landmarks  are  organized  hierarchically,  such  that  the  radius 
of  a  landmark  increaises  with  its  level,  and  the  radii  of  top  level  landmarks  iaclude  all  routers. 
Addressing  and  packet  forwarding  schemes  are  introduced.  Link-state  algorithms  can  not  be  used 
with  the  landmark  hierarchy,  and  a  thorough  study  of  enforcing  ToS  and  policy  constraints  with 
this  hierarchy  has  not  been  done. 

The  landmark  hierarchy  may  look  similar  to  our  viewserver  hierarchy,  but  in  fact  they  are  quite 
opposite.  In  the  landmark  hierarchy,  nodes  within  the  radius  of  the  landmark  maintain  a  route  to 
the  landmark,  and  the  landmark  may  not  have  a  route  to  these  nodes.  In  the  viewserver  hierarchy, 
viewserver  maintains  routes  (i.e.  a  view)  to  the  nodes  in  its  precinct. 

Route  fragments  [7]  is  an  addressing  scheme.  A  destination  route  fragment,  called  a  route 
suffix,  is  a  sequence  of  domain  ids  from  a  backbone  to  the  destination  domain.  A  source  route 
fragment,  called  a  route  prefix^  is  the  reverse  of  a  route  suffix  of  that  domain.  There  are  also  Tt>ute 
middles^  which  are  from  transit  domains  to  transit  domains.  These  addresses  are  static  (i.e.  they 
are  not  updated  with  topology  ch^ges)  and  stored  at  the  name  servers.  A  source  queries  a  name 
server  and  obtains  destination  route  suffixes.  It  then  chooses  an  appropriate  route  suffix  for  the 
destination  and  concatenates  it  with  its  own  route  prefix  (and  uses  routes  middles  if  route  suffix 
and  route  prefix  do  not  intersect).  This  scheme  can  not  handle  topology  changes  and  does  not 
address  handling  policy  and  ToS  constraints. 
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3  Viewserver  Hierarchy  Query  Protocol 


Id  this  section,  we  present  our  scheme  for  static  network  conditions,  that  is,  all  links  and  nodes 
remain  operational.  The  dynarruc  case  is  presented  in  Section  4. 

Conventions:  Each  domain  has  a  unique  id.  Domainids  denotes  the  set  of  domain-ids.  Each 
node  has  an  id  which  is  unique  in  its  domain.  Nodelds  denotes  the  set  of  node-ids.  Thus,  a  node  is 
totally  identified  by  the  combination  of  its  domain’s  id  and  its  node-id.  Totallds  denotes  the  set 
of  total  node-ids.  For  a  node  u,  we  use  domainid{u)  to  denote  the  domain-id  of  u’s  domain.  We 
use  nodeid{u)  and  totalid(u)  to  denote  the  node-id  and  total-id  of  u  respectively.  For  a  domain  A, 
we  use  domainid(A)  to  denote  the  domain-id  of  A.  NodeNeighbors^u)  denotes  the  set  of  node-ids 
of  the  neighbors  of  u.  DomainNeighbor^A)  denotes  the  set  of  domain-ids  of  the  domain  neighbors 
of  A.  We  use  the  term  gateway-id  (or  viewserver-id)  to  mean  the  total-id  of  a  gateway  node  (or  a 
viewserver  node). 

In  our  protocol,  a  node  u  uses  two  kinds  of  sends.  The  first  kind  has  the  form  “Send(m)  to  r”, 
where  m  is  the  message  being  sent  and  v  is  the  total-id  of  the  destination.  Here,  nodes  u  and  v 
are  neighbors,  and  the  message  is  sent  over  the  physical  link  (u,  u).  If  the  link  is  down,  we  assume 
that  the  packet  is  dropped. 

The  second  kind  of  send  has  the  form  ‘‘Send(m)  to  v  using  d/sr”,  w^here  m  and  v  are  as  above 
and  dlsT  is  a  domain-level  source  route  between  u  and  v.  Here,  the  message  is  sent  using  the  intra- 
domain  routing  protocols  of  the  domains  in  dlsr  to  reach  We  assume  that  as  long  as  there  is  a 
sequence  of  up  links  connecting  the  domains  in  dlsr.  the  message  is  delivered  to  If  the  u  and 
V  are  in  the  same  domain,  dlsr  equals  (). 

Views  and  Viewserwers 

Domain-level  views  are  maintained  by  special  nodes  called  rieinserrers.  Each  viewserver  has  a 
prectnci,  which  is  a  set  of  domains  around  the  viewserver,  and  a  static  view,  which  is  a  domain-level 
view  of  the  precinct  and  outgoing  edges.  The  static  view  includes  the  ToS  and  policy  constraints 

Rec^J]  that  given  a  domab-levei  source  route  to  a  destination,  xtsing  the  intra-domain  routing  protocols  we  can 
reach  the  destination. 

This  involves  time-onts,  retransmissions,  etc.  It  requires  a  transport  protocol  support  such  as  TCP. 
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of  domains  in  the  precinct  and  of  domain-level  edges^^.  Formally,  a  viewservei  z  maintains  the 
foUovring: 

Precinct,;.  (C  Domainlds).  Domain-ids  whose  \dew  is  maintained. 

SView-.  Static  view  of  z. 

=  {(A,  policy ktos{A),  {{B,  edge.policyUtos{A,B)) :  B  €  subset  of  DomainNeighbors{A))) : 
A  €  Precinctx} 

SVieWs  can  be  implemented  as  adjacency  list  representation  of  graphs  [1].  The  intention  of 
SVieWx  is  to  obtain  domain-level  source  routes  between  nodes  in  Precinct^.  Hence,  the  choice  of 
domains  to  include  in  Precincix  and  the  choice  of  neighbors  of  domains  to  include  in  SVteWg  is 
not  arbitrary.  Precincix  and  SView^  must  be  connected;  that  is,  between  any  two  domains  in 
Precinctx,  there  should  be  a  path  in  SView^  that  lies  in  Precinctx.  Note  that  SVieWx  can  contain 
edges  to  domains  outside  Precincix.  We  say  that  a  domain  A  is  in  the  view  of  a  viewserver  x,  if 
either  A  is  in  the  precinct  of  i  or  SVieWx  has  an  edge  from  a  domain  in  precinct  to  A.  Note  that 
the  precincts  and  views  of  different  view  servers  can  be  overlapp'ft^  identical  or  disjoint. 

If  there  is  a  viewserver  x  whose  view  contains  both  the  source  and  the  destination  domains, 
then  r’s  view  can  be  used  to  obtain  the  required  domain-level  source  route  to  reach  the  destination. 
The  source  needs  to  reach  i  to  obtain  its  view.  If  the  source  and  i  are  in  the  same  domain,  i 
can  be  reached  using  the  intra-domain  routing  protocol.  If  i  is  in  another  domain,  then  the  source 
needs  to  have  a  domain-level  source  route  to  it^^.  In  this  case,  we  assume  that  source  has  a  set  of 
hxed  domain-level  source  routes  to  x. 

Viewserver  Hierarchy 

For  scaling  reasons,  we  cannot  have  one  large  view.  Thus,  obtaining  a  domain-level  source  route 
between  a  source  and  a  destination  which  are  far  away,  involves  accumulating  views  of  a  sequence  of 
viewservers.  To  keep  this  process  efi&dent,  we  organize  viewservers  hierarchically.  More  predsely, 
each  viewserver  is  assigned  a  hierarchy  level  from  0,1,...,  with  0  bdng  the  top  level  in  the  hierarchy. 
A  parent /child  relationship  between  viewservers  is  defined  as  follows: 

«  Nol  aJl  the  domain-kvel  edges  need  lo  be  included.  This  is  because  some  domains  may  have  many  neighbors 

ca-Tisiiig  i  big  storage  leq'uircxiient.  ^ 

We  cannot  obtain  this  domain-lcvei  source  rente  from  i,  i.e.  chicken-egg  problem. 
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1.  Every  level  i  viewserver,  i  >  0,  has  a  parent  viewserver  whose  level  is  less  than  i. 

2.  If  viewserver  r  is  a  parent  of  viewserver  y  then  x’s  view  contains  y’s  domain  and  y’s  view 
contains  r’s  domain^^. 

3.  The  view  of  a  top  level  viewserver  contains  the  domains  of  ail  other  top  level  viewservers. 
(typically,  top  level  viewservers  are  placed  in  backbones). 

Note  that  the  second  constraint  does  not  mean  that  all  top  level  viewservers  have  the  same  view. 
In  the  hierarchy,  a  parent  can  have  many  children  and  a  child  can  have  many  parents.  We  extend 
the  range  of  the  parent-child  relationship  to  ordinary  nodes;  that  is  if  the  Precinct^  contains  the 
domain  of  node  u,  we  say  that  u  is  a  child  of  a,  and  a  is  a  parent  of  u  (note  that  an  ordinary  node 
does  not  have  a  child).  We  assume  that  there  is  at  least  one  parent  viewserver  for  each  node. 

For  a  node  u,  an  address  is  defined  to  be  a  sequence  (ao,ri,-..,at)  such  that  a,-  for.i  <  i  is 
a  viewserver-id,  xq  is  a  top  level  viewserver-id,  at  is  the  total-id  of  n,  and  a,-  is  a  parent  of  a.+j. 
Note  that  a  node  may  have  many  addresses  since  the  parent-child  relationship  is  many-to-many.  K 
a  source  wants  a  domain-level  source  route  to  a  destination,  it  first  queries  the  name  servers  to 
obtain  a  set  of  addresses  for  the  destination.  Then,  it  queries  viewservers  to  obtain  an  accumulated 
view  containing  both  its  domain  and  the  destination’s  domain. 

Querying  the  name  servers  can  be  done  the  same  way  it  is  done  currently  in  the  Internet.  It 
requires  nodes  to  have  a  set  of  fixed  addresses  to  name  servers.  This  is  also  sufficient  in  our  case. 
However,  we  r.a.Ti  improve  the  performance  by  having  a  set  of  fixed  domain-levd  source  routes 
instead. 

View-Query  Protocol:  Obtzdning  Domain-Level  Source  Routes 

We  now  describe  how  a  domain-level  source  route  is  obtained  (regardless  of  whether  the  source  and 
the  destination  are  in  a  common  view  or  not). 

We  want  a  sequence  of  viewservers  whose  merged  views  contains  both  the  source  and  the 
destination  domains.  Addresses  provide  a  way  to  obtain  such  a  sequence,  by  first  going  up  in 
the  viewserver  hierarchy  starting  from  the  source  node  and  then  going  down  in  the  viewserver 
hierarchy  towards  the  destination  node.  More  precisely,  let  (so,-  be  an  address  of  the  source. 
Note  that  z  and  y  do  not  have  to  be  in  each  other’s  piednct. 

In  fact,  name  servers  are  called  domain  name  servers.  Bowever,  domain  names  and  the  domains  used  in  this 
paper  are  different.  We  nse  “name  servers*  to  avoid  conftision. 
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and  {do,...,di)  be  an  address  of  the  destination.  Then,  the  sequence  (st-i, •  •  .,so,do, . . 
meets  our  requirements.^®  in  fact,  going  up  all  the  way  in  the  hierarchy  to  top  level  viewservers 
may  not  be  necessary.  We  can  stop  going  up  at  a  viewserver  s,-  if  there  is  a  viewserver  dj,j  <  I 
such  that  the  domain  of  dj  is  in  the  view  of  Si  (one  special  case  is  where  s,-  =  dj). 

The  view-query  protocol  uses  two  message  types: 

•  (RequestViey,  sjiddress,  djiddress) 

where  s^ddress  and  d-address  axe  the  addresses  for  the  source  and  the  destination  respec¬ 
tively.  A  RequestView  message  is  sent  by  a  source  to  obtain  an  accumulated  view  containing 
both  the  source  and  the  destination  domains.  When  a  viewserver  receives  a  RequestViey 
message,  it  either  sends  back  its  view  or  forwards  this  request  to  another  viewserver. 

*  (ReplyViey,  sjaddress,  djiddress^  accumview) 

where  sjaddress  and  djo-ddress  are  as  above  and  accumview  is  the  accumulated  view.  A 
ReplyViey  message  is  sent  by  a  viewserver  to  the  source  or  to  another  viewserver  closer  to 
the  source.  The  accumview  field  in  a  ReplyViey  message  equals  the  union  of  the  views  of 
the  viewservers  the  message  has  visited. 

We  now  describe  the  events  of  a  source  node  (see  Figure  2).  The  source  node^^  sends  a 
RequestViey  packet  containing  a  source  and  a  destination  address  to  its  parent  in  the  source  ad¬ 
dress  (using  a  fixed  domain-level  source  route).  "When  the  source  receives  a  ReplyViey  packet,  it 
chooses  a  valid  path  using  the  accumview  in  the  packet.  If  it  does  not  find  a  valid  path,  it  can 
try  again  using  a  different  source  and/or  destination  address.  Note  that,  the  source  does  not  have 
to  throw  away  the  previous  accumulated  views,  but  merge  all  accumulated  views  into  a  richer  ac¬ 
cumulated  view.  In  fact,  it  is  easy  to  change  the  protocol  so  that  source  also  obtain  views  of 
individual  viewservers  to  make  the  accumulated  view  even  richer. 

The  events  of  a  viewserver  x  are  specified  in  Figure  3.  Upon  receiving  a  RequestViey  packet, 
X  checks  if  the  destination  domain  is  in  its  precinct'®.  If  it  is,  x  sends  back  its  view  in  a  ReplyViey 
packet*.  If  it  is  not,  x  forwards  the  request  packet  to  another  viewserver  as  follows:  x  checks  if  the 
domain  of  any  viewserver  in  the  destination  address  is  in  its  view  or  not.  If  there  is  such  a  domain, 

This  is  simiJiar  to  matching  route  fragmeuis[7].  However,  in  our  case  the  sequence  is  computed  in  a  distributed 
{asHon  (these  is  needed  to  handle  topology  changes). 

or  the  policy  server  in  the  sonice’s  domain 

Even  though  destination  can  be  in  the  view  of  z,  its  policies  and  ToS’s  are  not  in  the  view  if  it  is  not  in  the 
predncl  of  z. 
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Constants 

FizedRoutcsuiz),  for  every  \newserver-id  r  such  that  2  is  a  parent  of  u, 

<  if  do77ia:m<f(u)  =  domainid^z) 

“  i  {(di, . . dr»)  :  €  Domainlds} .  Set  of  domain-level  routes  to  z  otherwise 

Events 

ReqxiesiVicwu{sjaddres$,  djaddress)  {Executed  when  v  wants  a  valid  domain-level  source  route} 

Let  s^address  be  (sq,  . .  and  dlsr  £  Fixed  Routes  u{st^i)] 

Send(RequestVi€u,  sjaddress,  djaddress)  to  S:_i  using  disr 

iiecctueti  (Reply View,  sjaddress,  djaddress,  accumview) 

Choose  a  valid  domain-level  source  route  using  accumvicir; 

If  a  valid  route  is  not  found 

Execute  RequestView^  again  with  another  source  address  and/or  destination  address 


Figure  2:  View- query  protocol:  Events  and  state  of  a  source  u. 


Constants 

Precinctx‘  Precinct  of  x. 

SVieWx,  Static  view  of  z. 

Events 

Receive xi'B.eqaestViev,  s ^address,  djaddress) 

Let  djaddress  be  (do,  •  •  •  j  ^t)\ 
if  domainid{dt)  ^  Precineix  then 

/or  ti?ardr  (Re  quest  View,  s^address,  djaddress,  {}); 
else /ortoard- (Reply View,  djaddress,  s^address,  SView^)]  {addresses  are  switched} 

endif 

iieceiver (Reply View,  s-joddress,  djaddress,  view) 

/oriycrd- (Reply View,  sjaddress,  djaddress,  viewUSView^) 

where  procedure  f orwardx{typt,  sjiddress,  djiddrtss,  view) 

Let  sjiddress  be  (so, . . . ,5i),  djaddress  be  (do,...,d7); 
if  3i  :  domainid{d{)  in  SView^  then 

Let  i  =  max{j  :  domainid{dj)  in  SVitWx')\ 
target  :=  d,; 

else  target  :=  Si  such  that  s,+i  =  iotalid{x)‘, 
endif; 

dlsr  :=  choose  a  route  to  domatnid{targei)  from  domainid{x)  using  SVieWx\ 
if  type  =  RequestView  then 

Send(RequestView,  sjaddress,  djaddress)  to  target  using  dl$r\ 
else  Send(RcplyView,  sjaddress,  djaddress,  view)  to  target  using  dlsr; 
endif 


Figure  3:  View-query  protocol:  Events  and  state  of  a  viewserver  x. 


X  sends  the  RequestView  packet  to  the  last  such  one.  Otherwise  x  is  a  viewserver  in  the  source 
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address  and  sends  the  packet  to  its  parent  in  the  source  address.  (Note  that  if  a:  is  a  viewserver  in 
the  destination  address,  its  child. in  the  destination  awidress  is  definitely  in  its  view.) 

When  a  viewserver  x  receives  a  Reply  View  packet,  it  merges  its  view  to  the  accumulated  view 
in  the  packet.  Then  it  sends  the  ReplyView  packet  towards  the  source  node  same  way  it  would 
send  a  RequestView  packet  towards  the  destination  node  (i.e.  the  role  of  the  source  address  and 
the  destination  address  are  changed). 

Above  we  have  described  one  possible  way  of  obtaining  the  aiccumulated  views.  There  are 
various  other  possibilities,  for  example:  (1)  restricting  the  ReplyView  packet  to  take  the  reverse 
of  the  path  that  the  RequestView  packet  took;  (2)  having  ReplyView  packets  go  all  the  way 
up  in  the  viewserver-hierarchy  for  a  richer  accumulated  view;  (3)  source  polling  the  viewservers 
directly  instead  of  viewservers  forwarding  request /reply  messages  to  each  other;  (4)  not  including 
the  non-transit  stub  domains  other  than  the  source  and  the  destination  domains  in  the  accumview\ 
(5)  including  some  source  policy  constraints  and  ToS  requirements  in  the  RequestView  packet, 
and  having  the  viewservers  filter  out  some  domains. 

4  Update  Protocol  for  Dynamic  Network  Conditions 

In  tiis  section,  we  first  examine  how  topology  changes  sncli  as  link/node  failnres,  repairs,  and  cost 
cnanges,  map  into  domain-level  topoiogj'  cnanges.  Second,  we  describe  bow  domain-level  topology 
changes  are  detected  and  communicated  to  viewservers,  i.e.  view-npdate  protocol.  Third,  we  modify 
the  view-qnery  protocol  appropriately. 

Mapping  Topology  Changes  to  Domain-Level  Topology  Changes 

Costs  are  associated  with  domain-level  edges.  The  cost  of  the  domain-level  edge  (A,JB)  equals  a 

vector  of  values  if  the  link  is  up;  each  cost  value  indicates  how  expensive  it  is  to  cross  domain  A 

to  reach  domain  B  according  to  some  criteria  snch  as  delay,  throughput,  reliability,  etc.  The  cost 

equals  oo  if  all  links  from  .A  to  .B  are  down^®.  Each  cost  value  of  a  domain-level  edge  (A.,B)  can 

be  derived  horn  the  cost  values  of  the  intra-domain  routes  in  A  and  lints  from  j4  to  B  (4]^°. 

Note  that  if  a  gateway  connecting  A  to  Bis  down,  its  lints  ane  also  conadeted  to  be  down. 

For  example,  the  delay  of  a  domain-level  edge  {A,B)  can  be  calculated  as  the  maximnm/average  delay  of  the 
routes  from  any  gateway  in  to  first  gateway  in  B. 
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Link  cost  chajiges  and  link/node  failnies  and  repairs  correspond  to  cost  changes,  failures  and 
repairs  of  domain-level  edges.  Link/node  failures  can  also  partition  a  domain  into  cells[15].  A  cell 
is  a  maximal  subset  of  nodes  of  a  domain  that  can  reach  each  other  without  leaving  the  domain. 
With  partitioning,  some  nodes  as  well  as  some  neighbor  domains  may  not  be  accessible  by  all 
cells.  In  the  same  way,  link/node  repairs  may  merge  ceBs  into  bigger  cells.  We  identify  a  cell 
with  the  minimum  node-id  of  the  gateways  in  the  cell.  In  this  paper,  for  uniformity  we  treat 
an  unpartitioned  domain  as  a  domain  with  one  cell;  we  do  not  consider  cells  that  do  not  isolate 
gateways  since  such  cells  do  not  affect  inter-domain  routes. 

If  a  domain  gets  partitioned,  its  vertex  in  the  domain-level  views  should  be  split  into  as  many 
pieces  as  there  are  cells.  And  when  the  cells  merge,  the  corresponding  vertices  should  be  merged 
as  well. 

Since  a  domain  can  be  partitioned  into  many  cells,  domain-level  source  routes  now  include  cell- 
ids  as  well.  Hence,  the  intra-domain  routing  protocol  of  a  domain  should  include  a  route  to  each 
reachable  neighbor  domain  ceU.^^ 

View-Update  Protocol:  Updating  Domain-Level  Views 

Viewserveis  do  not  communicate  with  each  other  to  maintain  their  views.  Gateways  detect  and 
communicate  domain-level  topology  changes  to  viewservers.  Each  gateway  periodically  (and  op¬ 
tionally  after  a  change  in  the  intra-domain  routing  table)  inspects  its  intra-domain  routing  table 
and  determines  the  cell  it  belongs.  For  each  cell,  oriiy  the  gateway  whose  node-id  is  the  cell-id 
(i.e.  the  gateway  with  the  minimum  node-id)  is  responsible  for  communicating  domain-level  topol¬ 
ogy  changes.  We  refer  to  this  gateway  as  the  reporting  gateway.  Reporting  gateways  compute 
the  domain-level  edge  costs  for  each  neighbor  domain  cell,  and  report  them  to  parent  viewservers. 
They  are  also  responsible  for  informing  the  viewservers  of  the  creation  and  deletion  of  cells. 

The  communication  between  a  reporting  gateway  and  viewservers  is  done  by  hooding  over  a 
set  of  domains.  This  set  is  referred  to  as  the  flood  area^.  The  topology  of  a  flood  area  must 

Oux  cells  axe  like  tkc  domain  components  of  n)PR[13]. 

Tliis  involves  the  following  changes  in  the  intra-domain  routing  protocol:  (1)  Whenever  the  cell-id  of  a  gateway 
changes,  it  reports  its  new  cell-id  to  adjacent  gateways  in  neighbor  domains.  When  they  receive  this  information, 
they  update  their  intra-domain  routes  to  include  the  new  cdl-id.  (2)  Usually  when  a  node  recovers  from  a  failure, 
it  queries  its  neighbors  in  its  domain  for  their  intra-domain  routes.  When  a  gateway  recovers,  it  should  also  query 
adjacent  gateways  in  neighbor  domains  for  their  cell-ids. 

For  efiidency,  the  flood  area  can  be  implemented  by  a  radius  and  some  forwarding  limits  (e.g.  do  not  flood 
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be  a  connected  graph.  Due  to  the  nature  of  flooding,  a  viewserver  can  receive  information  out  of 
order  for  a  domain  cell.  In  order  to  avoid  old  information  replacing  new  information,  each  gateway 
includes  successively  increaising  time  stamps  in  the  messages  it  sends. 

Due  to  node  and  link  failures,  communication  between  a  reporting  gateway  and  a  viewserver 
can  fail,  resulting  in  the  viewserver  having  out-of-date  information.  To  eliminate  such  information, 
a  viewserver  deletes  any  information  about  a  domain  cell  if  it  is  older  than  a  time-to-die  period.  We 
assume  that  gateways  send  messages  more  often  than  the  time-to-die  value  (to  avoid  false  removal). 

When  a  viewserver  learns  of  a  new  domain  cell,  it  adds  it  to  its  view.  To  avoid  adding  a  domain 
cell  which  was  just  deleted^'^,  when  a  viewserver  receives  a  delete  domain  cell  request,  it  only  marks 
the  domain  cell  as  deleted  (and  removes  the  entry  after  the  time-to-die  period). 

The  view-update  protocol  uses  two  types  of  messages  as  follows; 

•  (UpdateCell,  domainid,  cellid,  timestamp,  floodarea,  ncostset) 

is  sent  by  the  reporting  gateway  to  inform  the  viewservers  about  current  domain-level  edge 
costs  of  its  cell.  Here,  domainid,  cellid,  and  timestamp  indicate  the  domain,  the  cdl  and  the 
time  stamp  of  the  reporting  gateway,  ncostset  contains  a  cost  for  each  naghbor  domain  cell, 
and  floodarea  is  the  set  of  domains  that  this  message  is  to  be  sent  over. 

•  (DeleteCell,  domainid,  cellid,  timestamp,  floodarea) 

where  the  parameters  are  as  in  the  UpdateCell  message.  It  is  sent  by  a  reporting  gateway 
when  it  becomes  non-reporting  (because  its  cell  expanded  to  include  a  gateway  with  lower 
id),  to  inform  viewservers  to  delete  the  gateway’s  old  cell. 

The  state  maintained  by  a  gateway  g  is  listed  in  Figure  4.  Note  that  LocalView server Sg  and 
LocalGatewaySg  can  be  empty.  IntraDomainRTg  contains  a  route  (next-hop  or  source)  for  every 
reachable  node  of  the  domain  and  for  every  reachable  neighbor  domain  cdl^.  We  assume  that 
consecutive  reads  of  Clock g  returns  increasing  values. 

The  state  maintained  by  a  viewserver  x  is  listed  in  Figure  5.  DViewx  is  the  dynamic  part  of 
x’s  view.  For  each  domain  cell^®  known  to  x,  DViewx  stores  a  timestamp  field  which  equals  the 
beyond  b&ckbones)  instead  of  a  set. 

If  the  domain  cell  was  removed,  the  timestamp  for  that  domain  cell  is  also  lost. 

IntraDomainRTg  is  a  view  in  case  of  a  link-state  routing  protocol  or  a  distance  table  in  case  of  a  distance-vector 
routing  protocoL 

**  We  use  A:g  to  denote  the  ceD  g  of  domain  A. 
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Constants: 

LocalViewserverSj.  (C  Totallds).  Set  of  viewservers  in  g's  domain. 
LocalGaiewaySg .  (C  Totallds).  Set  of  gateways  in  p’s  domain  excluding  p. 
AdjForeignGaiewaySg.  (C  Totallds).  Set  of  adjacent  gateways  in  other  domains. 
FloodAreOg.  (C  Domainlds).  The  flood  area  of  the  domain  (includes  domain  of  p). 
Variables: 

IntraDomainRTg .  Intra-dom^n  routing  table  of  p.  Initially  contains  no  entries. 
Cellldg  :  lodelds.  The  id  of  p’s  cell.  Initially  =  oo 
Clock g  :  Integer.  Clock  of  p. 

Figure  4;  State  of  a  gateway  g. 


Constants: 

Precinct-.  Precinct  of  x. 

SViewx-  Static  view  of  x. 

TimeToDiCg  :  Integer.  Time-to-die  value. 

Variables: 

DViewx-  Dynamic  view  of  x. 

=  {{A:g,  timestamp,  expirytime,  deleted, 

\{B:b,  cost)  :  B  G  Dom.ainNeighbors{A)  A  h  G  Hodelds  U  {*}  })  : 

A  G  Precincts  A  p  G  Bodelds} 

Clocks  :  Integer.  Clock  of  x. 

Figure  5:  State  of  a  viewserver  x. 

largest  timestamp  received  for  this  domain  cell,  an  expirytime  Held  which  ecjuals  the  end  of  the 
time-io-die  period  for  this  domain  cell,  a  deleted  field  which  marks  whether  or  not  the  domain  cell 
is  deleted,  and  a  cost  set  which  indicates  a  cost  for  every  ndghbor  domain  cell  whose  domain  is  in 
SViewx.  The  cell-id  of  a  ndghbor  domain  equals  *  if  no  cell  of  the  ndghbor  domain  is  reachable. 
The  events  of  gateway  g  and  a  viewserver  x  are  specified  in  Appendix  A. 

Changes  to  View-Query  Protocol 

We  now  enumerate  the  changes  needed  to  adapt  the  view-query  protocol  to  the  dynamic  case  (the 
formal  specification  is  omitted  for  space  reasons). 

Due  to  link  and  node  failures,  RequestViev  and  ReplyView  packets  can  get  lost.  Hence,  the 
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source  may  never  receive  a  ReplyVie-  packet  after  it  initiates  a  request.  Tims,  the  source  should 
try  again  after  a  time-out  period. 

When  a  viewserver  receives  a  RequestView  message,  in  the  static  case  it  replies  with  its  view 
if  the  destination  domain  is  in  its  precinct.  Now,  because  domain-level  edges  can  fail,  it  must  also 
check  its  dynamic  view  and  reply  with  its  views  only  if  its  dynamic  view  contains  a  path  to  the 
destination.  Similarly  during  forwarding  of  RequestView  and  ReplyView  packets,  a  viewserver, 
while  checking  whether  a  domain  is  in  its  view,  should  also  check  if  its  dynamic  view  contains  a 
path  to  it. 

Finally,  when  a  viewserver  sends  a  message  to  a  node  whose  domain  is  partitioned,  it  should 
send  a  copy  of  the  message  to  each  cell  of  the  domain.  This  is  because  a  viewserver  does  not  know 
which  cell  contains  the  node. 

5  Evaluation 

Many  inter-domain  routing  protocols  have  been  proposed,  based  on  various  kinds  of  hierarchies. 
How  do  these  protocols  compare  against  each  other  and  against  the  simple  approach?  To  answer  this 
question,  we  need  a  model  in  which  we  can  define  internetwork  topologies,  policy /ToS  constraints, 
inter-domain  routing  hierarchies,  and  evaluation  measures  (e.g.  memory  and  time  requirements) 
for  inter-domain  routing  protocols. 

In  this  section,  we  first  present  such  a  model,  and  then  use  the  model  to  evaluate  our  viewserver 
hierarchy  and  compare  it  to  the  simple  approach.  Our  evaluation  measures  are  the  amount  of 
memory  required  at  the  source  and  at  the  routers,  the  amount  of  time  needed  to  construct  a  path, 
and  the  number  of  paths  found  out  of  the  total  number  of  valid  paths. 

Even  though  the  model  described  here  can  be  applied  to  other  inter-domain  routing  protocols, 
we  have  not  done  so,  and  hence  have  not  compared  them  against  our  viewserver  hierarchy.  This 
is  because  of  lack  of  time,  and  because  precise  definitions  of  the  hierarchies  in  these  protocols  is 
not  available.  For  example,  to  do  a  fair  evaluation  of  IDPIl[13],  we  need  predse  guidelines  for 
how  to  group  domains  into  super-domains,  and  how  to  choose  between  the  union  and  intersection 
methods  when  defining  policy/ToS  constraints  of  super-domains.  In  fact,  these  protocols  have  not 
been  evaluated  in  a  way  that  we  can  compare  them  to  the  viewserver  hierarchy.  To  the  best  of  our 
knowledge,  this  paper  is  the  first  to  evaluate  a  hierarchical  inter-domain  routing  protocol  against 
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explicitly  stated  policy  constraints. 


5.1  Evaluation  Model 

We  first  describe  our  metbod  of  generating  topologies  and  policy/ToS  constraints.  We  then  describe 
the  evaluation  measures. 

Generating  Internetwork  Topologies 

For  our  purposes,  an  internetwork  topology  is  a  directed  graph  where  the  nodes  correspond  to 
domains  and  the  edges  correspond  to  domain-level  connections.  However,  an  arbitrary  graph  will 
not  do.  The  topology  should  have  the  characteristics  of  a  real  internetwork,  like  the  Internet.  That 
is,  it  should  have  backbones,  regionals,  MANS,  LANS,  etc.;  these  should  be  connected  hierarchically 
(e.g.  regionals  to  backbones),  but  “non-hierarchical”  connections  (e.g.  “back-doors”)  should  also 
be  present. 

For  brevity,  we  refer  to  backbones  as  class  0  domains,  regionals  as  class  1  domains,  metropolitan- 
area  domains  and  providers  as  class  2  domains,  and  campus  and  local-area  domains  as  class  3 
domains.  A  (strictly)  hierarchical  interconnection  of  domains  means  that  class  0  domains  are 
connected  to  each  other,  and  for  i  >  0,  class  i  domains  are  connected  to  class  i  —  1  domains. 
As  mentioned  above,  we  also  want  some  “non-hierarchical”  connections,  i.e.,  domain-level  edges 
between  domains  irrespective  of  their  classes  (e.g.  from  a  campus  domain  to  another  campus 
domain  or  to  a  backbone  domain). 

In  reality,  domains  span  geographical  regions  and  domain-level  edges  are  usually  between  do¬ 
mains  that  are  geographically  close  (e.g.  University  of  Maryland  campus  domain  is  connected  to 
SURANET  regional  domain  which  is  in  the  east  cost).  A  class  t  domain  usually  spans  a  larger 
geographical  region  than  a  class  i+l  domain.  To  generate  such  interconnections,  we  associate  a 
“region”  attribute  to  each  domain.  The  intention  is  that  two  domains  with  the  same  repon  are 
geographically  close. 

The  region  of  a  class  i  domain  ias  the  form  ro-xj. •••  JTi,  where  the  r^’s  are  integers.  For 
example,  the  region  of  a  class  3  domain  can  be  1.2. 3. 4.  For  brevity,  we  refer  to  the  region  of  a 
class  i  domain  as  a  class  i  region. 

Note  that  regions  have  their  own  hierarchy.  Class  0  regions  are  the  top  level  regions.  We  say 
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thai  a  class  t  region  xo-xi.  ••  -  .ri  is  coniained'm  the  class  i  —  1  region  xo-rj.  •  •  -  .Xi-i  (where  i  >  0). 
Containment  is  transitive.  Thus  region  1.2. 3. 4  is  contained  in  regions  1.2.3,  1.2  and  1. 


Figure  6:  Regions 

Given  any  pair  of  domains,  we  classify  them  as  local,  remote  or  far,  based  on  their  regions. 
Let  X  be  a  class  i  domain  and  Y  a  class  j  domain,  and  (without  loss  of  generality)  let  i  <  j. 
X  and  y  are  local  if  they  axe  in  the  same  class  i  region.  For  example  in  Figure  6,  A  is  local  to 
B,C,J,K,M,N,0.,P.,  and  Q.  X  and  Y  arc  remote  if  they  axe  not  in  the  same  class  i  region  but 
they  are  in  the  same  class  :  —  1  region,  or  if  i  =  0.  For  example  in  Figure  6,  some  of  the  domains 
A  is  remote  to  are  D,£,  F,  and  L.  X  and  Y  are  far  if  they  are  not  local  or  remote.  For  example 
in  Figure  6,  .4  is  far  to  J. 

We  refer  to  a  domain-level  edge  as  local  [remote,  or  far)  if  the  two  domains  it  connects  are  local 
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(remote,  or  far). 

We  use  the  foliowing  procedure  to  generate  internetwork  topolopes; 

•  We  first  specify  the  number  of  domain  classes,  and  the  number  of  domains  in  each  class. 

•  We  next  specify  the  regions.  Note  that  the  number  of  re^on  classes  equals  the  number  of 
domain  classes.  We  specify  the  number  of  class  0  regions.  For  each  class  i  >  0,  we  specify  a 
branching  factor,  which  creates  that  many  class  i  re^ons  in  each  class  i  -  1  repon.  (That  is, 
if  there  are  two  class  0  regions  and  the  class  1  branching  factor  equals  three,  then  there  are 
six  class  i  regions.) 

•  For  each  class  i,  we  randomly  map  the  class  i  domains  into  the  class  t  regions.  Note  that 
several  domains  can  be  mapped  to  the  same  region,  and  some  regions  may  have  no  domain 
mapped  into  them. 

•  For  every  class  i  and  every  class  j,  j  >  i,  we  specify  the  number  of  local,  remote  and  fax 
edges  to  be  introduced  between  class  i  domains  and  class  j  domains.  The  end  points  of  the 
edges  are  chosen  randomly  (within  the  specified  constraints). 

We  ensure  that  the  internetwork  topology  is  connected  by  ensuring  that  the  subgraph  of  class 
0  domains  is  connected,  and  each  class  i  domain,  for  i  >  0,  is  connected  to  a  local  class  t  —  1 
domain. 

Choosing  Policy /ToS  Constraiints 

We  chose  a  simple  scheme  to  model  Policy /ToS  constraints.  Each  domain  is  assigned  a  color:  green 
or  red.  For  each  domain  class,  we  specif}'  the  percentage  of  green  domains  in  that  class,  and  then 
randomly  choose  a  color  for  each  domain  in  that  class . 

A  valid  route  from  a  source  to  a  destination  is  one  that  does  not  visit  any  red  intermediate  do¬ 
mains;  the  source  and  destination  are  allowed  to  be  red.  Notice  that  this  models  transit  policy  /ToS 
constraints.  We  are  working  on  extending  this  modd  to  source  policy /ToS  constraints. 

Computing  Evaluation  Measures 

The  evaluation  measures  of  most  interest  for  an  inter-domain  routing  protocol  are  its  memory  and 
time  requirements,  and  the  number  of  valid  paths  it  finds  (and  thdr  lengths)  in  comparison  to 
the  number  of  available  valid  paths  (and  their  lengths)  in  the  internetwork  (e.g.  could  it  find  the 
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shortest  valid  path  in  the  internetwork). 

The  only  analysis  method  we  have  at  present  is  to  numerically  compute  the  evaluation  measures 
for  a  variety  of  source-destination  pairs.  Because  we  use  internetwork  topologies  of  large  sizes,  it  is 
not  feasible  to  compute  for  all  possible  source-destination  pairs.  We  randomly  choose  a  set  of  source- 
destination  pairs  that  satisfy  the  following  conditions:  (1)  the  source  and  destination  domains  are 
different,  and  (2)  there  exists  a  valid  path  from  the  source  domain  to  the  destination  domain  in 
the  internetwork  topology.  (Note  that  the  simple  scheme  would  always  find  such  a  path.) 

For  a  source-destination  pair,  we  refer  to  the  length  of  the  shortest  valid  path  in  the  internetwork 
topology  as  the  shortest-path  length.  Since  the  number  of  paths  between  a  source-destination  pair 
is  potentially  very  large  (factorial  in  the  number  of  domains),  and  we  are  not  interested  in  the 
paths  that  are  too  long,  we  only  count  the  number  of  paths  whose  lengths  are  not  more  than  the 
shortest-path-length  plus  2. 

The  evaluation  measures  described  above  are  protocol  independent.  However,  there  are  also 
important  evaluation  measures  that  are  protocol  dependent  (e.g.  number  of  levels  traversed  in 
some  particular  hierarchy).  Because  of  this  we  postpone  the  precise  dehmtions  of  the  evaluation 
measures  to  the  next  subsection  (their  definition  is  dependent  of  viewserver  hierarchy). 

5.2  Application  to  Viewserver  Protocol 

We  have  used  the  above  model  to  evaluate  our  viewserver  protocol  for  several  different  viewserver 
hierarchies  and  query  methods.  We  first  describe  the  different  viewserver  schemes  evaluated.  Please 
refer  to  Figure  6  in  the  following  discussion. 

The  first  viewserver  scheme  is  referred  to  as  base.  It  has  exactly  one  viewserver  in  each  domain. 
Each  viewserver  is  identified  by  its  domain-id.  The  domains  in  a  viewserver’s  precinct  consist  of 
its  domain  and  the  nrighboring  domains.  The  edges  in  the  viewserver’s  view  consist  of  the  edges 
between  the  domains  in  the  precinct,  and  edges  outgoing  from  domains  in  the  precnnct  to  domains 
not  in  the  precinct.  For  example,  the  precinct  of  viewserver  A  (i.e.  the  viewserver  in  domain  A) 
consists  of  domains  A,B,J;  the  edges  in  the  view  of  viewserver  A  consists  of  domain-level  edges 
( A,  B),  ( A,  7),  (B,  J),  ( J,  M),  ( J,  K),  ( J,  F),  and  ( J,  B). 

As  for  the  viewserver  hierarchy,  a  viewserver ’s  levd  is  defined  to  be  the  class  of  its  domain.  That 
is,  a  viewserver  in  a  class  i  domain  is  a  level  i  viewserver.  For  each  level  i  viewserver,  i  >  0,  its 
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parent  viewserver  is  chosen  randomly  from  the  level  i  -  1  viewservers  in  the  parent  region  such  that 
there  is  a  domain-level  edge  between  the  viewserver’s  domain  and  the  parent  viewservcr’s  domain. 
For  example,  for  viewserver  C,  we  can  pick  viewserver  J  or  ii' ;  suppose  we  pick  J.  For  viewserver 
J,  we  have  no  choice  but  to  pick  M  (N  and  O  are  not  connected  to  J).  For  M,  we  pick  P  (out  of 
P  and  Q). 

We  use  only  one  address  for  each  domain.  The  viewserver-address  of  a  stub  domain  is  con¬ 
catenation  of  four  viewserver  (i.e.  domain)  ids.  Thus,  the  address  of  A  is  P.M.J.A.  Similarly,  the 
address  of  E  is  P.M.K.E.  To  obtain  a  route  between  A  and  E,  it  suffices  to  obtain  views  of 
viewservers  A,  J,  K,  E. 

The  second  viewserver  scheme  is  referred  to  as  base-QT  (where  the  QT  stands  for  “query  upto 
top”).  It  is  identical  to  hast  except  that  during  the  query  protocol  all  the  viewservers  in  the  source 
and  the  destination  addresses  are  queried.  That  is,  to  obtain  a  route  between  A  and  P,  the  views 
of  A,J,M,P,E,E  are  obtained. 

The  third  viewserver  scheme  is  referred  to  as  locals.  It  is  identical  to  base  except  that  now  a  • 
viewserver’s  precinct  also  contains  domains  that  have  the  same  region  as  the  viewserver’s  domain. 
That  is,  the  precinct  of  viewserver  A  has  the  domains  A,B,J,C.  Note  that  in  this  scheme  a 
viewserver’s  view  is  not  necessarily  connected.  For  example,  if  the  edge  (C,  J)  is  removed,  the  view 
of  viewserver  A  is  no  longer  connected.  (In  Section  3,  we  said  that  the  view  of  a  viewserver  should 
be  connected.  Here  we  have  relaxed  this  condition  to  simplify  testing.) 

The  fourth  viewserver  scheme  is  referred  to  as  locals-QT.  It  is  identical  to  locals  except  that 
during  the  query  protocol  all  the  viewservers  in  the  source  and  the  destination  addresses  are  queried. 

The  fifth  viewserver  scheme  is  referred  to  as  vertex-extension.  It  is  identical  to  base  except 
that  viewserver  precincts  axe  extended  as  follows;  Let  P  denote  the  precinct  of  a  viewserver  in  the 
base  scheme.  For  each  domain  X  in  P,  if  there  is  an  edge  from  domain  X  to  domain  Y  and  Y 
is  not  in  P,  domain  Y  is  added  to  the  precinct;  among  T’s  edges,  only  the  ones  to  domains  in  P 
are  added  to  the  view.  In  the  example,  domains  M,E,F,E  are  added  to  the  precinct  of  A,  but 
outgoing  edges  of  these  domains  to  other  domains  axe  not  included  (e.g.  {F,G)  is  not  included). 
The  advantage  of  this  scheme  is  that  even  though  it  increases  the  prednct  size  by  a  factor  which 
is  potentially  greater  than  2,  it  increases  the  number  of  edges  stored  in  the  view  by  a  factor  less 
than  2.  (In  fact,  if  the  same  edge  cost  and  edge  policies  are  used  for  both  directions  of  domain- 
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level  edges,  then  the  only  other  information  that  needs  to  be  stored  by  the  viewservers  is  the  poBcy 
constraints  of  the  newly  added  domains.) 

The  sixth  viewserver  scheme  is  referred  to  as  fuU-QT.  It  is  constructed  in  the  same  way  as 
vertex-extension  except  that  the  locals  scheme  is  used  instead  of  base  scheme  to  define  the  P  in 
the  construction.  In  fuU-QT,  during  the  query  protocol  all  the  viewservers  in  the  source  and  the 
destination  addresses  axe  queried. 

In  all  the  above  viewserver  schemes,  we  have  used  the  same  hierarchy  for  both  domain  classes 
and  viewservers.  In  practice,  not  all  domains  need  to  have  a  viewserver,  and  a  viewserver  hierarchy 
different  from  the  domain  class  hierarchy  can  be  deployed.  However,  there  is  an  advantage  of 
having  a  viewserver  in  each  domain;  that  is,  source  nodes  do  not  require  fixed  domain-level  source 
routes  to  their  parent  viewservers  (in  the  view-query  protocol).  This  reduces  the  amount  of  hand 
configuration  required.  In  fact,  the  base  scheme  does  not  require  any  hand  configuration,  viewservers 
can  decide  their  precincts  from  the  intra-domain  routing  tables,  and  nodes  can  use  intra-domain 
routes  to  reach  parent  viewservers. 

Results  for  Internetwork  1 

The  parameters  of  the  first  internetwork  topology,  referred  to  as  Internetwork  1,  are  shown  in 
Table  1. 

Our  evaluation  measures  were  computed  for  a  (randomly  chosen  but  fixed)  set  of  1000  source- 
destination  pairs.  For  brevity,  we  use  spl  to  refer  to  the  shortest-path  length  (i.e.  the  length  of 
the  shortest  valid  path  in  the  internetwork  topology).  The  Tni-nimnTn  spl  of  these  pairs  was  2,  the 
maximum  spl  was  13,  and  the  average  spl  was  6.8.  Table  2  lists  for  each  viewserver  scheme  (1)  the 
minimum,  average  and  maximum  precinct  sizes,  (2)  the  minimum,  average  and  maximum  merged 
view  sizes,  and  (3)  the  minimum  ,  average  and  maximum  number  of  viewservers  queried. 

The  precinct  size  indicates  the  memory  requirement  at  a  viewserver.  More  precisely,  the  memory 
requirement  at  a  viewserver  is  0  (precinct  size  x  d)  where  d  is  the  average  number  of  ndghbor 
domains  of  a  domain,  except  for  the  vertex-extension  and  ftdl-QT  schemes.  In  these  schemes,  the 
memory  requirement  is  increased  b}'  a  factor  less  than  two.  Hence  the  vertex-extension  scheme  has 
the  same  order  of  viewserver  memory  requirement  as  the  base  scheme  and  the  full-QT  scheme  has 

^^Brasching  factor  is  4  lor  all  region  classes. 
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Class  i 

No.  of  Domains 

No.  of  Regions^^ 

%  of  Green  Domains 

Edges  between  ( 
1 

Class  j  j  Local 

[glasses  i  and  j 

i 

Remote  |  Far 

0 

10 

4 

0.80 

0 

8 

6 

0 

1 

100 

16 

0.75 

0 

190 

20 

0 

1 

26 

5 

0 

2 

1000 

64 

0.70 

0 

100 

0 

0 

1 

1060 

40 

0 

2 

200 

40 

0 

3 

10000 

256 

0.20 

0 

100 

0 

0 

1 

1 

100 

0 

0 

2 

10100 

50 

0 

3 

50 

50 

50 

Table  1;  Parameters  of  Internetwork  1. 


Scheme 

Precinct  Size 

Merged  View  Size 

No.  of  Viewservers  Queried 

base 

2  /  3.2  /  68 

7  /  71.03  /  101 

3  /  7.51  /  8 

base-QT 

2  /  3.2  /  68 

30  /  76.01  /  101 

8  /  8.00  /  8 

locals 

2  /  52.0  /  103 

3  /  95.40  /  143 

2  /  7.42  /  8 

locals- QT 

2  /  52.0  /  103 

43  /  101.86  /  143 

8  /  8.00  /  8 

vertex-extension 

3  /  19.2  /  796 

23  /  362.15  /  486 

3  /  7.51  /  8 

jull-QT 

11  /  102.9  /  796 

228  /  396.80  /  519 

8  /  8.00  /  8 

Table  2:  Precinct  sizes,  merged  view  sizes,  and  number  of  viewservers  queried  for  Internetwork  1. 

the  same  order  of  viewserver  memory  requirement  as  tbe  locals  scheme. 

The  merged  view  size  indicates  the  memory  requirement  at  a  source;  i.e.  the  memory  require¬ 
ment  at  a  source  is  O  (merged  view  size  x  d)  except  for  the  vertex-extension  and  full-QT  schemes. 
Note  that  the  source  does  not  need  to  store  information  about  red  and  non- transit  domains.  The 
numbers  in  Table  2  take  advantage  of  this. 

The  number  of  viewservers  queried  indicates  the  communication  time  required  to  obtain  the 
merged  view  at  the  source.  Because  the  average  spl  is  6.8,  the  “real-time”  communication  time 
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required  to  obtain  the  merged  view  at  a  source  is  slightly  more  than  one  round-trip  time  between 
the  source  and  the  destination. 

As  is  apparent  from  Table  2,  using  a  QT  schenae  increases  the  merged  view  size  and  the  number 
of  viewservers  queried  only  by  about  5%.  Using  a  locals  scheme  increases  the  merged  view  size 
by  about  30%.  Using  the  vertex-extension  scheme  increases  the  merged  view  size  by  5  times  (note 
that  the  amount  of  actual  memory  needed  increases  only  by  a  factor  less  than  2).  The  number  of 
viewservers  queried  in  the  locals  scheme  is  less  than  the  number  of  viewservers  queried  in  the  base 
scheme.  This  is  because  the  viewservers  in  the  locals  scheme  have  bigger  precincts,  and  a  path  from 
the  source  to  the  destination  can  be  found  using  fewer  views. 

Table  3  shows  the  average  number  of  spZ,  spl  +  1,  spl  +  2  length  paths  found  for  a  source- 
destination  pair  by  the  simple  approach  and  by  the  viewserver  schemes.  All  the  viewserver  schemes 
are  very  close  to  the  simple  approach.  The  vertex-extension  and  full-QT schemes  are  especially  close 
(they  found  98%  of  all  paths).  Table  3  also  shows  the  number  of  pairs  for  which  the  viewserver 
schemes  did  not  find  a  path  (ranging  from  1.4%  to  5.9%  of  the  source-destination  pairs),  and 
the  number  of  pairs  for  which  the  viewserver  schemes  found  longer  paths.  For  these  pairs,  more 
viewserver  addresses  need  to  be  tried.  Note  that  the  locals  dxid  vertex-extension  schemes  decrease  the 
number  of  these  pairs  substantially  (adding  Q Tyields  further  improvement).  Our  policy  constraints 
are  source  and  destination  domain  independent.  Hence,  even  a  class  2  domain,  if  it  is  red,  can  not 
carry  traffic  to  a  class  3  domain  to  which  it  is  connected.  We  believe  that  these  fi^gures  would 
improve  with  policies  that  are  dependent  on  source  and  destination  domains. 

As  is  apparent  from  Table  3  and  Table  2,  the  locals  scheme  does  not  find  many  more  extra 
paths  than  the  base  scheme  even  though  it  has  larger  precinct  and  merged  view  sizes.  Hence  it  is 
not  recommended.  The  vertex-extension  scheme  is  the  best,  but  even  base  is  adequate  since  it  finds 
many  paths. 

We  have  repeated  the  above  evaluations  for  two  other  internetworks  and  obtained  gimnaT  con¬ 
clusions.  The  results  are  in  Appendix  B. 

6  Concluding  Remarks 

We  presented  iuerarchical  inter-domadn  routing  protocol  that  (1)  satisfies  policy  and  ToS  con¬ 
straints,  (2)  adapts  to  dynamic  topology  changes  including  failures  that  partition  domains,  and 
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Scheme 

Numt 

spl 

er  of  paths  found 

spl  +  1  1  spl  -1-  2 

No.  of  pairs 

with  no  path 

No.  of  pairs 

with  longer  paths 

simple 

2.51 

18.48 

131.01 

N/A 

N/A 

base 

2.41 

15.84 

99.42 

59 

3  by  1.33  hops 

base-QT 

2.41 

15.86 

100.16 

.  54 

3  by  1.33  hops 

locals 

2.41 

16.17 

103.54 

29 

3  by  1  hop 

locals- QT 

2.41 

16.29 

105.02 

20 

3  by  1  hop 

vertex- extension 

2.51 

18.38 

128.19 

22 

0  by  0  hops 

fuU-QT 

2.50 

18.40 

128.90 

14 

0  by  0  hops 

Table  3:  Number  of  paths  found  for  Internetwork  1. 


(3)  scales  well  to  large  number  of  domains. 

Our  protocol  uses  partial  domain-level  views  to  achieve  scaling  in  space  requirement.  It  floods 
domain-level  topolopcal  changes  over  a  flood  area  to  achieve  scaling  in  communication  requirement. 

It  does  not  abstract  domains  into  superdomains;  hence  it  does  not  lose  any  domain-level  detail 
in  ToS  and  policy  information.  It  merges  a  sequence  of  partial  views  to  obtain  domain-level  source 
routes  between  nodes  which  are  far  away.  The  number  of  views  that  need  to  be  merged  is  bounded 
by  twice  the  number  of  levels  in  the  bieraxcby. 

To  evaluate  and  compare  inter-domain  routing  protocols  against  each  other  and  against  sim¬ 
ple  approach,  we  presented  a  model  in  which  one  can  define  internetwork  topolo^es,  policy /ToS 
constraints,  inter-domain  routing  hierarchies,  and  evaluation  measures.  We  applied  this  model  to 
evaluate  our  viewserver  hierarchy  and  compared  it  to  the  simple  approach.  Our  results  indicate 
that  viewserver  hierarchy  finds  many  short  valid  paths  and  reduces  the  amount  of  memory  require 
ment  by  two  order  of  magnitude. 

Our  protocol  recovers  from  fail-stop  failures  of  viewservers  and  gateways.  When  a  viewserver 
fails,  an  address  which  includes  the  viewserver’s  id  becomes  useless.  This  defiaency  can  be  overcome 
by  replicating  each  viewserver  at  different  nodes  of  the  domain  (in  this  case  a  viewser^'er  fails  only 
if  all  nodes  implementing  it  fail).  This  repHcation  scheme  requires  viewserver  ids  to  be  independent 
of  node  ids,  which  can  be  easily  accomplished^®. 

«  Foi  example,  if  node-ids  ol  nodes  iiaplementiiig  a  viewservei  share  a  prefix,  this  prefix  can  be  used  as  the 
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The  oidy  drawback  of  our  protocol  is  that  to  obtaiB  a  domain-level  source  route,  views  are 
merged  at  (or  prior  to)  the  connection  (or  flow)  setup,  thereby  increasing  the  setup  time.  This 
drawback  is  not  unique  to  our  scheme  (8,  13,  6, 10]. 

There  are  several  ways  to  reduce  the  setup  overhead.  First,  domain-level  source  routes  to  fre¬ 
quently  used  destinations  can  be  cached.  The  cach^g  period  would  depend  on  the  ToS  require¬ 
ment  of  the  applications  and  the  frequency  of  domain-level  topology  changes.  For  example,  the 
period  can  be  long  for  electronic  mail  since  it  does  not  require  shortest  paths. 

Second,  views  of  frequently  queried  viewservers  can  be  replicated  at  “mirror”  viewservers  in  the 
source  domain.  A  viewserver  would  periodically  update  the  views  of  its  mirror  viewservers. 

Third,  connection  setup  also  involves  traversing  the  name  server  hierarchy  (to  obtain  destination 
addresses  from  its  names).  By  integrating  the  name  server  hierarchy  with  the  viewserver  hierarchy, 
we  may  be  able  to  do  both  operations  simultaneously.  This  requires  further  investigation. 
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A  View-Update  Protocol  Event  Specifications 

The  events  of  gateway  g  are  specified  in  Figure  7.  When  a  gateway  g  recovers,  Celllig  is  set  to 
nodeid{g).  Thus,  when  g  next  executes  UpdaiCg,  it  sends  either  an  UpdateCell  or  a  DeleteCell 
message  to  viewservers,  depending  on  whether  it  is  no  longer  the  miniinum  id  gateway  in  its  ceL  . 

The  events  of  a  viewserver  x  are  specified  in  Figure  5.  Note  that  when  x  adds  an  entry  to 
DView:  (upon  recdving  a  UpdateCell  message),  it  selectively  chooses  subset  of  neighbors  from 
the  cost  set  in  the  packet  to  include  only  the  neighbor  domains  which  axe  in  SViews-  When 
a  viewserver  x  recovers,  DView^  is  set  to  U*  view  becomes  up-to-date  as  it  recmves  new 
information  from  reporting  gateways  (and  remove  false  information  with  the  time-to-die  period). 

Sending  a  DeleteCell  message  is  essential.  Because  prior  to  the  failure,  g  may  have  been  the  smallest  id 
gateway  in  its  cell.  Hence,  some  viewservei’s  may  still  contain  an  entry  for  its  old  domain  cell. 
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UpdatCg  {Executed  periodically  and  also  optionally  upon  a  change  in  IntraDomainRTg} 
{Determines  the  id  of  p’s  cell  and  initiates  UpdateCell  and  DeleteCell  messages  if  needed.} 
OldCeUId^Cellldg; 

Ctllldg  :=  compute  cell  id  using  LocalGaUwaySg  and  IntraDomainRTg'^ 
if  nodtid{g)  rr  Ccllldg  then 

ncostsct  :=  compute  costs  for  each  neighbor  domain  cell  using  IniraDomainRTg] 

((UpdateCell,  domainid{g)^  Cdlldg,  Clockg,  FloodArtag^ncod.sti))\ 

endif 

if  nodtid{g)  =  OldCtllldi^  Ccllldg  then 

//ood^ ((DeleteCell,  domainid[g)^  nodtid{g),  Clochg,  FloodArtag))\ 
endif 

Receive  ^{packet)  {either  an  UpdateCell  or  a  DeleteCell  packet) 

flood gipacket) 

where  procedure  floodg{j>ackei) 

if  dDTncinad(p)  E  packet. floodarea  then 

{remove  domain  of  g  from  the  flood  area  to  avoid  infinite  exchange  of  the  same  message.) 
packet. floodarea  :=  packet. floodarea  —  {domai7iid{g)}; 
for  all  h  E  LocalGaiewaySg  \J  LocalViewsei^crSg  do 
Send(pacAet)  to  h  using  (); 
endif 

for  all  h  E  AdjForeignGaiewaySg  A  domainid{h)  £  packet./ loodarea  do 
Send(paciet)  to  h\ 

Gateway  Failure  Model:  A  gateway  can  undergo  failures  and  recoveries  at  anytime.  We  assume  failures 
are  fail-stop  (i.e.  a  failed  gateway  does  not  send  erroneous  messages).  WTaen  a  gateway  p  recovers,  Cellldg 
is  set  to  nodcid(p). 


Figure  7:  View-update  protocol:  Events  of  a  gatew^ay  p. 


B  Results  for  Other  Internetworks 


Results  for  Internetwork  2 

The  parameters  of  the  second  internetwork  topology,  referred  to  as  Internetwork  2,  zo-e  the  same 
as  the  parameters  of  Internetwork  1  (a  different  seed  is  used  for  the  random  number  generation). 

Our  evalnation  measures  were  computed  for -a  set  of  1000  source-destination  pairs.  The  mini¬ 
mum  spl  of  these  pairs  was  2,  the  maximum  spl  was  13,  and  the  average  spl  was-  7.2. 

Table  4  and  Table  5  shows  the  results.  Similar  conclusions  to  Internetwork  1  hold  for  Internet¬ 
work  2.  In  Table  5,  the  reason  that  local  and  QT  schemes  have  more  pairs  with  longer  paths  than 
the  base  scheme  is  that  these  schemes  found  some  paths  (which  are  not  shortest)  for  some  pairs  for 
which  the  base  scheme  did  not  find  any  path. 
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ReceivCriVjxidLteCell,  did,  cid,  is,  FloodArca,  ncsei) 
if  did  €  Prtcincix  then 

i!3{did:c}d,  iimesiamp,  ezpirytime,  deleted,  ncostsei)  £  DVieWx  A 
is  >  timestamp  then  {received  is  more  recent;  delete  the  old  one} 

delete  {didxid,  timestamp,  expiryiime,  deleted,  ncostsei)  iroTa  DViewxl 
endif 

if  “i3((iid:dd,  timestamp,  ezpiryiime,  deleted,  ncosiset)  ^DVieWx  then 
Choose  ncosiset  from  ncset  using  SViewx] 

insert  (didreid,  ts,  Clockx^TimeToDiCxy  lalse,  ncostsei)  to  DView^; 
endif 
endif 

iieceiver  (DeleteCell,  did,  cid,  is,  floodarea) 
if  did  €  Prtcineix  then 

if3(dfd:cid,  timestamp,  expirytime,  deleted,  ncosiset)  Q,  DV iewx  A 
ts  >  timestamp  then  {received  is  more  recent;  delete  the  old  one) 

delete  (didxid,  timestamp,  ezpiryiime,  deleted,  ncosiset)  fiom  DViewx] 
endif 

if -»3(did:cid,  timestamp,  expirytime,  deleted,  ncosiset)  £DVieWx  then 
insert  (didreid,  ts,  Clockx-i- TimeToDicx,  true,  {}}  to  DV’tctiJx; 
endif 
endif 

DeleiCx  {Executed  periodically  to  delete  entries  older  than  the  time-to-die  period) 

for  all  {A:g,  tstamp,  expirytime,  deleted,  ncsei)  G  DViewx  A  expirytime  <  Clockx  do 
delete  (A:g,  tstamp,  expirytime,  deleted,  ncset)  from  DVieWg] 

Viewserver  Failure  Model:  A  viewserver  can  undergo  failures  and  recoveries  at  anytime.  We  assume 
failures  are  fail-stop.  WTien  a  viewserver  z  recovers,  DViewx  is  set  to  {}. 


Figure  8:  View  update  events  of  a  viewserver  z. 


Scheme 

Precinct  Size 

Merged  View  Size 

No.  of  Viewservers  Queried 

base 

2  /  3.2  /  76 

4  /  66.62  /  96 

3  /  7.55  /  8 

base-QT 

2  /  3.2  /  76 

29  /  72.76  /  96 

8  /  8.00  /  8 

locals 

3  /  69.8  /  149 

4  /  101.32  /  148 

2  /  7.36  /  8 

locals-QT 

3  /  69.8  /  149 

35  /  110.32  /  152 

8  /  8.00  /  8 

vertex-extension 

3  /  19.47  /  817 

15  /  339.60  /  469 

3  /  7.55  /  8 

11  /  135.2  /  817 

186  /  402.51  /  503 

8  /  8.00  /  8 

Table  4:  Precinct  sizes,  merged  view  sizes,  and  no  of  viewservers  queried  for  Internetwork  2. 


Results  for  Internetwork  3 


The  parameters  of  the  third  internetwork  topology,  referred  to  as  Internetwork  3,  are  shown  in 
Table  6.  Internetwork  3  is  more  connected,  more  class  0,  1  and  2  domains  are  green,  and  more 
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Table  5;  N  Timber  of  paths  foTmd  for  Intemetwork  2. 


class  3  domains  are  red.  Hence,  we  expect  more  valid  paths  between  soTiice  and  destination  pairs. 

Our  evaluation  measures  were  computed  for  a  set  of  1000  source-destination  pairs.  The  mini¬ 
mum  spl  of  these  pairs  was  2,  the  maximum  spl  was  10,  and  the  average  spl  was  5.93. 
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Table  7  and  Table  8  shows  the  results.  Similar  conclusions  to  Internetworh  1  and  2  hold  for 
Internetwork  3. 


Scheme 

Precinct  Size 

Merged  View  Size 

No.  of  Viewservers  Queried 

base 

2  /  3.5  /  171 

5  /  134.41  /  206 

3  /  7.26  /  8 

base-QT 

2  /  3.5  /  171 

55  /  154.51  /  206 

8  /  8.00  /  8 

locals ' 

3  /  70.17  /  171 

4  /  164.16  /  257 

2  /  7.09  /  8 

locals-QT 

3  /  70.17  /  171 

57  /  191.06  /  258 

8  /  8.00  /  8 

vertez-eztension 

5  /  34.17  /  1986 

18  /  601.56  /  695 

3  /  7.26  /  8 

full-QT 

14  /  155.5  /  1986 

503  /  655.79  /  743 

8  /  8.00  /  8 

Table  7:  Precinct  sizes,  merged  view  sizes,  and  no  of  viewservers  queried  for  Internetwork  3. 


Number  of  paths  found 

No.  of  pairs 

No.  of  pairs 

Scheme 

spl 

spl  -f- 1 

spl  +  2 

with  no  path 

with  longer  paths 

simple 

R9 

368.97 

N/A 

N/A 

base 

2.83 

24.25 

178.08 

17 

11  by  1.09  hops 

base-QT 

2.87 

25.53 

193.41 

12 

8  by  1.12  hops 

locals 

2.87 

21 

8  by  1  hop 

locals-QT 

on 

27.59 

219.63 

2 

6  by  1  hop 

G9 

35.73 

332.54 

5 

1  by  1  hop 

full-QT 

3.33 

36.47 

346.44 

0 

0  by  0  hops 

Table  8:  Number  of  paths  found  for  Internetwork  3. 


Figure  9  through  Figure  11  show  the  number  of  spl,  spl  +  1  and  spl  +  2  length  paths  found  by 
the  schemes  as  a  function  of  spl  (we  only  show  results  for  spl  values  for  which  more  than  10  pairs 
were  found).  We  do  not  indude  base-QT,  locals  and  locals-QT  schemes  since  they  are  very  dose 
to  base  scheme.  As  expected,  as  spl  increases,  the  number  of  paths  for  a  source-destination  pair 
increases,  and  the  gap  between  the  simple  scheme  and  the  viewservei  schemes  increases. 
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Static  time  driven  scheduling  has  been  advocated  for  use  in 
Hard  Real-Time  systems  and  is  particularly  appropriate  for 
many  embedded  systems.  The  approaches  taken  for  static 
scheduling  often  use  seardi  techniques  and  may  reduce  the 
search  by  using  heuristics.  In  this  paper  we  present  a  tech¬ 
nique  for  analyzing  the  temporal  relations  among  the  tasks, 
based  on  non-preemptive  schedul ability.  The  relationships 
can  be  used  effectively  to  reduce  the  average  complexity  of 
scheduling  these  tasks.  They  also  serve  as  a  basis  for  selective 
preemption  policies  for  scheduling  by  providing  an  early  test 
for  infeasibility.  We  present  examples  and  simulation  results 
to  confirm  the  usefulness  of  temporal  analysis  as  a  phase  prior 
to  scheduling. 

1  Introduction 

Many  safety  critical  real-time  applications  like  process  con¬ 
trol,  embedded  tactical  systems  for  military  applications,  air- 
traffic  control,  robotics  etc.  have  stringent  timing  constraints 
imposed  on  their  computations  due  to  the  characteristics  of 
the  physical  system.  A  failure  to  observe  the  timing  con¬ 
straints  can  result  in  intolerable  system  degradation  and  in 
some  cases  it  may  have  catastrophic  consequences. 

Sciieduling  is  the  primary  means  of  ensuring  the  satisfaction 
of  timing  constraints  for  such  systems[l].  As  a  result,  signif¬ 
icant  effort  has  been  invested  in  research  on  hard  real  time 
scheduling  [2,  3,  4].  In  this  paper  we  discuss  a  scheduling 
technique  for  static  scheduling  to  guarantee  timely  execution 
of  time  critical  tasks. 

The  time  driven  scheduling  model  is  being  used  by  many 
experimental  systems,  including  MARS[5],  MARUTI[6]  and 
Spring  Kernel [7].  The  static  time  driven  scheduling  technique 
involves  constructing  a  schedule  offline,  which  may  be  repre¬ 
sented  as  a  Gantt  chart[8]  or  calendar [6]  (Figure  1).  Tasks 
are  invoked  at  run-time  whenever  they  are  scheduled  to  exe¬ 
cute.  Such  a  scheduling  model  is  particularly  appropriate  for 
many  embedded  systems.  Recent  effort  in  this  direction  has 
shown  the  viability  of  such  an  approach  for  practical  real-time 
applications[9]. 

*This  research  was  supported  in  part  by  ONR  and  DARPA  under 
contract  N00014-91-C-0195. 


Figure  1:  Gantt  Chart  or  Calendar 

The  intractability  of  most  scheduling  problems  has  led  to 
approaclies  betsed  on  search  tecliniques  for  scheduling  of  real¬ 
time  tasks.  The  feasibility  of  a  task  set  is  determined  through 
construction  of  a  schedule;  failure  to  construct  a  schedule 
denotes  infeasibility.  Heuristics  are  often  used  as  a  means 
of  controlling  the  complexity  of  scheduling.  In  many  cases, 
heuristics  perform  well  enough  to  result  in  an  acceptable  so¬ 
lution. 

There  has  been  little  emphasis  on  the  use  of  analytic  tecli- 
niques  to  assist  in  time  driven  scheduling.  Decomposition 
sclieduling[10]  based  on  dominance  properties  of  sequences[ll] 
uses  analytic  techniques  to  decompose  a  set  of  tasks  into  a  se¬ 
quence  of  subsets.  Significant  reduction  in  average  complexity 
can  be  achieved  if  the  set  of  tasks  can  be  decomposed  into  a 
large  number  of  subsets,  each  having  a  small  number  of  tasks. 

In  this  paper,  we  present  an  analysis  technique  for  time 
driven  scheduling  based  on  the  timing  requirements  of  tasks. 
The  analysis  results  in  the  establishment  of  a  set  of  temporal 
relations  between  pairs  of  tasks  based  on  a  non-preemptive 
scheduling  model.  These  relations  can  be  used  by  scheduling ' 
algorithms  to  reduce  the  complexity  of  scheduling  in  the  av¬ 
erage  case,  and  as  an  early  test  for  infeasibility.  As  a  test  for 
infeasibility,  it  provides  a  good  basis  for  policies  using  selec¬ 
tive  preemption  to  enhance  feasibility.  When  infeasibility  is 
not  detected,  the  temporal  relations  may  be  used  by  a  search 
algorithm  to  effectively  prune  large  portions  of  search  space, 
thereby  controlling  the  cost  of  scheduling. 

2  Time  Driven  Scheduling 

The  time  driven  scheduling  approach  constructs  a  calendar 
for  the  set  of  tasks  in  the  system.  The  tcisks  may  be  sched¬ 
uled  preemptively  or  non-preemptively.  The  non-preemptive 
scheduling  problem  for  a  uniprocessor  is  known  to  be  NP- 
Complete[12].  When  the  tasks  are  mutually  independent,  and 
can  be  preempted  at  any  time,  it  is  known,  that  the  earliest 
deadline  first  policy  is  optimal [13]  and  obviates  the  need  for 
non-preemptive  scheduling.  However,  when  tasks  synchronize 
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using  critical  sections,  the  preemptive  scheduling  problem  is 
also  known  to  be  intractable(NP-Hard)[14]. 

In  general,  when  the  overhead  of  preemption  is  negligible, 
the  non-preemptive  solutions  form  a  subset  of  preemptive 
solutions[8].  However,  when  tasks  may  interact  with  each 
other,  the  non-preemptive  models  are  simpler,  easier  to  im¬ 
plement  and  closer  to  reality [15].  They  are  also  necessary  for 
certain  scheduling  domains  like  I/O  scheduling  and  provide  a 
basis  for  selective  preemption  policies. 

2.1  Task  Model 

We  consider  a  set  of  n  tasks  P  =  {r,*  :  i  =  1,2,  ...,n}  to 
be  scheduled  for  execution  on  a  single  processor.  Each  task 
Tiy  abbreviated  as  f,  is  a  3-tuple  [r,-,Ct,d,]  denoting  the  ready 
time,  computation  time  and  deadline  respectively.  The  time 
interval  is  called  the  timing  window  Wi  of  task  r,*,  and 

indicates  the  time  interval  during  which  the  task  can  execute. 
The  computation  time  of  each  task  is  less  than  the  window 
length  All  tasks  are  assumed  to  be  independent  for 

simplicity  of  exposition  even  though  such  a  requirement  is 
not  necessary  for  the  analysis. 

In  a  hard  real-time  system,  processes  may  be  periodic  or 
sporadic  [14].  Such  a  set  of  processes  may  be  mapped  to  our 
scheduling  model  by  techniques  identified  in  [1,  14,  16]  and 
constructing  a  schedule  for  the  least  common  multiple  of  the 
periods  of  the  tasks. 

2.2  Non-Preemptive  Scheduling  Model 

A  non-preemptive  schedule  is  the  mapping  of  each  task  r,-  in  P 
to  a  start  time  s,*.  The  task  is  then  scheduled  to  run  without 
preemption  in  the  time  interval  [5t,/{],  with  its  finish  time 
being  fi  —  si  -h  Cf.  A  feasible  schedule  is  a  schedule  in  which 
the  following  conditions  are  satisfied  for  each  task  r,- : 

ri  <  Si  (1) 

fi  <  di  (2) 

It  is  useful  to  consider  a  non-preemptive  schedule  as  an 

ordered  sequence  of  the  set  of  tasks.  To  get  a  maximally 
packed  schedule  from  a  sequence  [ti,  r2, . . . ,  r„],  we  can  re¬ 
cursively  derive  the  start  time  Sj  and  finish  time  fi  of  the 
tasks  as  follows: 

Si  =  (3) 

fi  =  Si  -h  C{  (4) 

with 

Si  =  ri 

The  scheduling  problem  can  thus  be  considered  as  a  search 
over  the  permutation  space.  A  permutation  (sequence)  is  fea¬ 
sible  if  the  corresponding  schedule  is  feasible.  Notice  that 
for  any  permutation  schedule  derived  as  above,  equation  1  is 
implied  by  (3)  and  we  only  need  to  verify  the  deadline  con¬ 
straints  for  the  tasks. 

^  \=z  di  -  Ti 
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Figure  2:  Infeasibility  of  task  A  executing  before  task  B 

3  Temporal  Analysis 

Temporal  analysis  uses  pairwise  schedulability  analysis  of 
tasks  to  generate  a  set  of  relations  to  eliminate  sequences 
which  cannot  lead  to  feasible  solutions.  In  this  section  we  de¬ 
fine  the  temporal  relations  and  show  how  they  may  be  derived 
from  the  timing  constraints  of  tcisks. 

3.1  Definitions  of  Temporal  Relations 

Consider  two  tasks  n  and  Tj .  We  wish  to  find  out  what  we 
can  say  about  the  relative  ordering  of  these  tasks,  given  their 
timing  constraints.  A  set  of  relations  are  identified  below 
which  identify  the  different  possibilities. 

Precedence  Relation:  A  precedence  relation  denoted  as 
Ti  — >■  Tj,  implies  that  in  any  feasible  schedule  r,-  must 
execute  before  Tj . 

Infeasible  Relation:  An  infeasible  relation  denoted  by 
Ti  0  Tj  implies  that  in  any  feasible  schedule,  rj  and  Tj 
cannot  run  in  a  sequential  order. 

Concurrent  Relation:  ||  rj  if  there  is  no  precedence  or 

infeasible  relation  between  them.  A  concurrent  relation 
indicates  that  a  feasible  schedule  may  exist  with  any  or¬ 
der  of  the  tasks  r,*  and  Tj.  It  does  not,  however,  indicate 
the  existence  of  a  feasible  schedule. 

For  each  task  Ti  let  us  define  two  terms  and  denoting 
the  earliest  finish  time  and  the  latest  start  time  as: 

e£  =  ri-hC£  (5) 

li  di  Ci  (6) 

A  preliminary  set  of  relations  can  be  established  using  the 


following  rules,  for  every  pair  of  tasks 

Ti  and  Tj. 

(ci  <  Ij)  A  (/,•  <  ej) 

=> 

Ti  Tj 

{-) 

(e,-  >  Ij)  A  (/j  >  Cj) 

Tj  - .  Ti 

(S) 

^  ^  ^j) 

=> 

n  II 

(9) 

{ci  >  Ij)  A  (/,•  <  ej) 

Ti  0  Tj 

(10) 

The  basic  idea  is  that  if  the  earliest  finish  time  of  a  task  A  is 
greater  than  the  latest  start  time  of  a  task  B,  then  a  feasible 
schedule  cannot  be  found  in  which  A  is  sell edu led  before  B 
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Figure  3:  Window  Modification  (A  — B) 
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Figure  5:  Precedence  Graph  for  example  of  Figure  4 

(Figure  2).  Thus,  for  instance  the  first  part  of  condition  for 
rule  10  says  that  r,-  cannot  precede  Tj,  and  the  second  part 
says  that  rj  cannot  precede  r,-,  establishing  the  infeasible  re¬ 
lation. 

3.2  Window  Modification 

Consider  two  tasks  Ta  and  n,  and  a  precedence  relation 
— }.  between  them.  As  this  indicates  that  in  any  fea¬ 

sible  schedule  must  precede  we  can  update  the  timing 
windows,  as  follows  (Figure  3): 

=  min{dajb)  (11) 

rl  =  max{ri„ea)  (12) 

The  window  modification  does  not  alter  the  scheduling 
problem  in  the  sense  that  every  feasible  sequence  with  the 
original  timing  constraints  is  a  feasible  sequence  with  the 
modified  timing  constraints  and  vice-versa.  Further,  the 
schedules  for  feasible  sequences  are  identical  in  both  cases. 
A  task’s  window  may  shrink  because  of  window  modification. 
This  may  lead  to  a  change  in  the  relation  of  the  modified  task 
with  other  tasks.  The  procedure  may  be  applied  iteratively 
till  no  further  changes  can  be  made  or  an  infeasible  relation 
is  detected. 

3.3  Examples 

(a)  Consider  a  set  of  five  tasks  as  shown  in  Figure  4.  The 
temporal  analysis  leads  us  to  the  following  set  of  prece¬ 
dence  relations,  sans  the  redundant  ones: 

{ro  — ^  r3,ri  — ►  r5,r3  — ^  r5,r4  — ^  T5} 

The  set  of  precedence  relations  may  be  represented  as 
a  precedence  graph  (Figure  5)  and  impose  a  partial  or¬ 
der  on  the  task  set.  Only  sequences  which  are  consistent 


with  this  partial  order  need  to  be  considered  for  schedul¬ 
ing.  For  5  tasks,  the  total  number  of  permutations  is 
120(=  5!).  The  number  of  total  orders  consistent  with 
the  partial  order  of  Figure  5  is  12,  which  is  a  drastic  re¬ 
duction  in  the  number  of  sequences  that  need  to  be  con¬ 
sidered  for  scheduling.  The  modified  task  set  is  shown  in 
Figure  4(b),  with  the  modified  values  in  bold. 

(b)  As  another  example,  consider  the  set  of  4  tasks  as  shown 
in  Figure  6.  The  task  set  in  different  stages  of  temporal 
analysis  is  shown,  with  the  new  temporal  relations"^  at 
each  stage.  This  example  shows  how  successive  refine¬ 
ment  of  temporal  relations  can  lead  to  detecting  infeasi¬ 
bility. 

3.4  Complexity  of  Temporal  Analysis 

It  is  easy  to  see  that  the  initial  set  of  relations  can  be  estab¬ 
lished  in  0(n“)  time.  Further,  each  phase  of  refinement  also 
takes  no  more  than  0{n^).  An  upper  bound  for  the  num¬ 
ber  of  phases  is  n.  Therefore,  the  worst  case  complexity  of 
temporal  analysis  is  O(n^).  In  practice,  however,  the  cost  of 
temporal  analysis  can  be  significantly  less  since  concurrent  re¬ 
lations  and  relations  between  non-overlapping  tasks  need  not 
be  generated  explicitly.  Furthermore,  the  number  of  phases 
required  to  stabilize  window  modification  can  be  reduced  if 
the  release  times  are  modified  in  the  topological  sort  order 
of  the  precedence  graph  and  deadlines  are  modified  in  the 
reverse  topological  sort  order. 

In  any  case,  the  cost  of  temporal  analysis  for  static  schedul¬ 
ing  is  not  significant  when  used  in  conjunction  with  an  expo¬ 
nential  time  scheduling  algorithm.  In  section  5,  we  show  em¬ 
pirically  that  the  cost  of  temporal  analysis  is  not  a  significant 
factor  for  static  scheduling. 

4  Non  Preemptive  Scheduling  using 
Temporal  Analysis 

The  relations  established  through  temporal  analysis  serve  as  a 
basis  for  scheduling  of  the  tasks.  Temporal  analysis  may  thus 
be  perceived  as  a  pre-processing  stage  for  scheduling.  The 
result  of  this  pre-processing  stage  is  one  of  the  following: 

1.  The  task  set  was  detected  to  be  infeasible,  due  to  the 
existence  of  one  or  more  infeasible  relations. 

2.  A  set  of  precedence  relations  were  established  generat¬ 
ing  a  precedence  graph.  The  precedence  graph  imposes 
a  partial  order  on  the  set  of  tasks.  It  serves  as  an  in¬ 
put  to  the  scheduler  which  may  exploit  the  partial  order 
generated  to  prune  the  search  space. 

4.1  Detecting  Infeasibility 

Whenever,  an  infeasible  relation  exists  between  two  tasks,  it 
is  known  that  no  ordering  of  the  two  tasks  is  feasible.  Thus, 

^Concurrent  Relations  are  not  shown. 
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Figure  4:  Window  Modification:  (a)  Original  Task  Set  (b)  Task  Set  after  Temporal  Analysis 
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Figure  6:  Example  for  Determining  Infeasibility  with  Temporal  Analysis 


the  detection  of  an  infeasible  relation  at  any  stage  in  tern- 
poral  analysis  indicates  that  the  task  set  is  infeasible.  Even 
though  only  pairwise  schedul ability  analysis  is  used  for  estab¬ 
lishing  relations,  successive  refinement  of  relations  results  in 
a  possible  percolation  of  this  effect  to  other  tasks  too.  This 
effect  is  exemplified  in  the  example  of  Figure  6,  where  sev¬ 
eral  iterations  lead  to  a  infeasible  relation.  It  must  be  noted 
that  whenever  infeasibility  is  detected,  the  resulting  task  set 
and  their  relations  also  provide  a  good  feedback  as  to  what 
caused  it.  The  feedback  information  may  be  used  to  allo¬ 
cate  more  resources,  change  resource  allocation  or  allow  for 
selective  preemption  as  the  case  may  be. 

4.2  Search  Technique  for  Scheduling 

The  intractability  of  non-preemptive  scheduling  has  led  to 
implicit  enumeration  techniques  based  on  branch  and  bound 
search  methods.  The  search  space  is  the  set  of  all  possible 
permutation  sequences.  One  way  of  enumerating  schedules  is 
to  generate  an  initial  schedule  and  then  successively  refine  it 
using  heuristics  to  generate  ‘‘better”  schedules,  until  a  feasible 
schedule  is  obtained  [3,  17,  16]. 

In  this  paper,  we  concentrate  on  another  enumeration 
method  whidi  constructs  a  schedule  in  an  incremental  man¬ 
ner.  Variants  of  this  method  have  been  used  in  [4,  18,  19, 
20,  21]  The  search  space  is  represented  as  a  search  tree.  The 


root  (level  0)  of  the  tree  is  an  empty  schedule.  The  nodes 
of  the  tree  represent  partial  schedules.  A  node  at  level  k 
gives  a  partial  schedule  with  k  tasks.  The  leaves  are  complete 
schedules.  The  successors  of  an  intermediate  node  are  imm.€‘ 
diaie  extensions  of  the  partial  schedule  corresponding  to  that 
node.  From  anode  at  level  k,  there  are  at  most  n-/:  branches 
with  each  branch  corresponding  to  an  extension  of  the  partial 
schedule  by  appending  one  more  task  to  the  schedule.  Search 
is  done  in  a  branch  and  bound  manner,  wherein  parts  of  the 
search  tree  are  pruned  when  it  is  determined  that  no  feasible 
schedule  can  arise  from  them.  For  each  node  being  expanded, 
the  following  conditions  must  hold. 

1.  All  immediate  extensions  of  the  node  must  be  feasible 
[4,  18]. 

2.  The  remaining  computational  demand  must  not  exceed 
the  difference  between  the  largest  deadline  of  remaining 
tasks  and  current  scheduling  time  [4]. 

If  any  condition  is  violated  then  no  feasible  schedule  can 
be  generated  in  the  subtree  originating  from  this  node.  No 
search  is  conducted  on  the  subtree  rooted  at  such  a  node. 

4.2.1  Heuristically  Guided  Scheduling 

Heuristics  are  commonly  used  to  guide  search  in  many  combi¬ 
natorial  searching  problems.  For  non-preemptive  scheduling 
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Figure  8:  Success  Ratio  vs  Cut-off-Time,  fic  =  5.0/ic, 

'  COVc  =  2.0 

^  developing  analytic  techniques.  Temporal  analysis  is  a  step 
in  this  direction  and  provides  an  efficient  way  of  analyzing  a 
:  task  set  and  deducing  valuable  information  for  scheduling. 

The  existence  of  an  infeasible  relation  in  a  task  set  gives 
a  sufficient  condition  for  infeasibility.  This  provides  an  early 
test  for  infeasibility,  which  can  then  be  used  as  a  basis  for 
selective  preemption  to  enhance  feasibility.  Alternatively,  the 
detection  of  infeasibility  may  be  used  to  allocate  more  re¬ 
sources  or  change  resource  allocation. 

The  precedence  relations  generated  as  a  result  of  temporal 
I  analysis  impose  a  partial  order  on  the  task  set  and  may  be 
effectively  used  to  prune  the  search  space  for  scheduling.  Our 
,  simulations  confirm  that  temporal  analysis  helps  in  improving 
the  performance  of  a  scheduling  algorithm  without  incurring 
^  a  significant  overhead.  In  the  simplest  scheduling  case,  when 
heuristics  perform  very  well,  temporal  analysis  might  be  per¬ 
ceived  as  a  way  of  formalizing  the  heuristics.  For  static  time 
driven  scheduling  to  be  a  feasible  technique,  it  becomes  im¬ 
perative  that  the  scheduling  cost  be  controlled  as  the  size  of 
the  problem  increases.  Temporal  analysis  provides  a  step  in 
.  the  right  direction. 

In  this  paper  we  have  been  concerned  with  single  proces¬ 
sor  scheduling.  An  interesting  extension  of  temporal  analysis 
would  be  to  use  it  for  multi-processor  scheduling.  One  way 
to  extend  the  analysis  to  multi-processor  scheduling  is  to  per¬ 
form  it  in  two  phases.  In  the  first  phase  the  infeasible  and 
concurrent  relations  may  be  used  to  obtain  an  allocation  of 
tasks  to  processors.  Then  in  the  second  phase,  the  analy¬ 
sis  shown  in  this  paper  can  be  used  for  each  processor  for 
L  scheduling. 

I  Many  real-time  system  specifications  impose  relative  tim¬ 
ing  constraints  on  the  tasks[23,  24].  In  this  paper,  we  have 
restricted  ourselves  to  absolute  constraints  on  the  start  and 
[I  finish  times  of  tasks.  When  more  complex  constraints  are 


Figure  9:  Success  Percentage  vs  Cut-off-Time,  //^  =  lO.O/zc, 
COVc  =  1.0 

imposed  on  tasks,  the  role  of  temporal  analysis  in  reducing 
the  search  space  becomes  even  more  important  since  simple 
heuristics  are  unlikely  to  perform  well.  It  would  be  interest¬ 
ing  to  see  how  temporal  analysis  can  be  extended  to  use  such 
constraints  to  further  prune  the  search  space. 

We  are  currently  implementing  a  scheduling  tool  based  on 
the  results  shown  in  this  paper.  The  tool  is  being  developed 
for  the  MARUTI  project,  an  experimental  real-time  system 
prototype  being  developed  at  the  University  of  Maryland, 
based  on  the  concept  of  pre-scheduling[6]. 
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heuristics  may  be  used  to  guide  search  along  paths  which  are 
more  likely  to  lead  feasible  schedules.  Search  is  done  in  a 
depth  first  manner  until  either  a  complete  feasible  schedule  is 
found,  in  which  case  the  search  terminates,  or  it  is  determined 
that  no  possible  extensions  of  the  current  node  can  lead  to  a 
feasible  schedule.  Heuristics  are  used  to  determine  which  of 
the  many  children  of  a  node  should  be  searched  next.  Back¬ 
tracking  takes  place  when  no  further  extensions  of  a  node  can 
be  made.  We  evaluate  temporal  analysis  using  such  a  heuris¬ 
tic  search  for  scheduling. 

5  Empirical  Evaluation  of  Temporal 
Analysis 

In  the  previous  sections,  we  have  shown  how  temporal  anal¬ 
ysis  may  be  used  to  restrict  the  search  space  for  scheduling. 
Clearly,  the  existence  of  even  a  few  precedence  relations  re¬ 
sults  in  a  drastic  reduction  of  search  space^.  However,  the 
usefulness  of  the  scheme  is  not  obvious  since  we  are  only  in¬ 
terested  in  feasible  scliedules,  hence  a  large  part  of  the  search 
space  may  never  need  to  be  examined.  We  have  conducted 
various  simulations  to  verify  that  indeed  temporal  analysis 
results  in  improved  performance  for  scheduling.  For  reasons 
of  space,  we  mention  only  a  few  significant  results. 

We  used  a  heuristic  search  technique  for  scheduling  as  de¬ 
scribed  in  section  4.2.  The  heuristic  used  for  our  simulation 
study  was  a  two  level  heuristic.  The  primary  heuristic  was 
earliest  start  <3fme(EST). 

ESTi  :=^  maxinjk) 

where  k  is  the  last  task  in  the  partial  schedule  at  that  node. 

In  the  case  of  a  conflict,  the  secondary  heuristic  earliest 
deadline  was  used.  Further  conflicts  were  resolved  arbitrarily. 
The  heuristic  has  a  natural  intuitive  appeal  and  is  known  to 
produce  good  results  among  linear  heuristics[22]. 

For  each  set  of  parameters,  we  generated  200  “feasible” 
task  sets  with  100  tasks  each.  The  task  sets  were  gener¬ 
ated  with  100%  utilization  as  this  presents  the  most  difficulty 
for  scheduling.  The  computation  times  were  generated  using 
uniform  distributions  and  laxities  using  normal  distribution. 
We  compared  the  success  percentage  (i.e.  percentage  of  suc¬ 
cessfully  scheduled  task  sets)  of  scheduling  with  and  without 
temporal  analysis  as  a  pre-processing  stage.  The  success  per¬ 
centage  (SP)  is  plotted  against  “cut-off-time”,  indicating  the 
maximum  time  allowed  to  the  scheduling  algorithm  to  suc¬ 
cessfully  generate  a  schedule. 

Our  simulation  results  show  that  temporal  analysis  is  not 
needed  for  scheduling  when  both  the  mean  and  the  variation 
in  laxities  is  low  since  the  simple  heuristics  were  able  to  sched¬ 
ule  almost  all  task  sets  (success  ratio  ^  1.0).  However,  when 
the  laxities  are  high  (as  compared  to  computation  times)  and 
the  variation  in  laxities  is  also  high^,  then  the  heuristics  do 

^Even  one  relation  reduces  the  search  space  by  hailf. 

^Note  that  the  task  set  utilization  is  100% 


SP 


Figure  7:  Success  Ratio  vs  Cut-off-Time,  //£  =  5.0//c, 
COVc  =  1.0 

not  perform  as  well  and  the  use  of  temporal  analysis  results 
in  10  —  20%  improvement  in  success  ratio. 

As  an  illustration,  we  show  a  few  plots  which  plot  the 
success  percentage  (SP)  of  scheduling  with  temporal  analy¬ 
sis  (TAS)  contrasted  with  success  percentage  of  scheduling 
without  temporal  analysis,  i.e.  the  baseline  scheduling  model 
(BM).  For  scheduling  with  temporal  analysis,  we  consider  two 
cases,  one  in  which  overhead  of  temporal  analysis  is  added  to 
scheduling  time  (TAS-f )  and  the  other  in  which  it  is  not  (TAS- 
)-  The  parameters  varied  are  the  mean  laxity  //£  in  terms  of 
mean  computation  times  /ici  a.nd  the  coefficient  of  variation 
for  laxity  COVc  ■  Figures  7  and  8  show  the  plots  for  low 
laxity  mean  with  low  and  high  variation.  For  this  case,  there 
is  no  significant  performance  improvement  due  to  temporal 
analysis  and  both  schemes  achieve  almost  100%  success  per¬ 
centage.  On  the  other  hand  when  the  average  laxity  is  high 
(Figures  9  and  10),  coupled  with  high  variation,  we  see  that 
temporal  analysis  results  in  significant  improvement  in  per¬ 
formance.  The  plots  also  show  that  the  curves  for  (TAS-I-) 
and  (TAS-)  are  almost  identical  showing  that  the  overhead  of 
temporal  analysis  is  minimal  when  compared  to  the  schedulng 
costs. 

6  Concluding  Remarks 

In  this  paper  we  have  presented  temporal  analysis  as  a  tech¬ 
nique  for  analyzing  the  timing  relationships  among  a  set  of 
tasks  to  establish  constraints  on  scheduling  which  are  dis¬ 
cernible  from  a  pairwise  analysis.  The  implications  and  the 
benefits  of  the  approach  as  a  pre-processing  stage  for  schedul¬ 
ing  has  been  shown  through  examples  and  simulation. 

Time  Driven  Scheduling  theory  has  relied  heavily  on  search 
techniques  for  scheduling  and  little  work  has  been  done  in 
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Figure  10:  Success  Percentage  vs  Cut-off-Time,  fic  =  lO.O/ic, 
COVc  =  2.0 
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Abstract 

The  Maruti  Real-Time  Operating  System  was  developed  for  applications  that  must 
meet  hard  real-time  constraints.  In  order  to  schedule  real-time  applications,  the  timing 
and  resource  requirements  for  the  application  must  be  determined.  The  development 
environment  provided  for  Maruti  applications  consists  of  several  stages  that  use  various 
tools  to  assist  the  programmer  in  creating  an  application.  By  analyzing  the  source  code 
provided  by  the  programmer,  these  tools  can  extract  and  analyze  the  needed  timing  and 
resource  requirements.  The  initial  stage  in  development  is  the  compilation  of  the  source 
code  for  an  application  written  in  the  Maruti  Programming  Language  (MPL).  MPL  is 
based  on  the  C  programming  language.  The  MPL  Compiler  was  developed  to  pro\dde 
support  for  requirement  specification.  This  report  introduces  MPL  and  describes  the 
implementation  of  the  MPL  Compiler. 
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1  Introduction 


A  real-time  system  requires  that  an  application  meet  the  timing  constraints  specified  for  it. 
For  hard  real-time,  a  failure  to  meet  the  specified  timing  constraints  may  result  in  a  fatal 
error  [2],  Timing  constraints  are  not  as  critical  for  soft  real-time.  The  Maruti  Operating 
System  was  developed  to  meet  the  real-time  constraints  required  by  many  applications.  In 
order  to  schedule  and  run  an  application  under  Maruti,  the  timing  and  resource  requirements 
for  that  application  must  be  determined.  The  development  environment  for  Maruti  consists 
of  several  tools  that  can  be  used  to  extract  and  analyze  these  requirements  [2]. 

The  Maruti  Programming  Language  (MPL)  is  a  language  developed  to  assist  users  in 
creating  applications  that  can  be  run  under  Maruti.  MPL  is  based  on  the  C  programming 
language,  and  assumes  the  programmer  is  familiar  with  C.  MPL  provides  some  additional 
constructs  that  are  not  part  of  standard  C  to  allow  for  resource  and  timing  specification  [1]. 
In  addition,  when  an  MPL  file  is  compiled,  some  of  the  resource  requirements  can  be 
recognized  and  recorded  to  an  output  file.  This  output  file  is  used  as  input  to  the  integration 
stage,  which  is  the  next  stage  in  the  development  cycle.  During  integration,  additional 
timing  requirements  may  be  specified. 

Previously,  an  MPL  file  •was  compiled  by  first  running  the  source  code  through  the 
Maruti  pre-compDer,  which  created  a  C  file  that  was  then  compiled  using  a  C  compiler  [1]. 
The  pre-compiler  extracted  the  necessary  information,  and  converted  the  MPL  constructs 
that  were  not  valid  C  statements  into  C  code.  This  required  the  additional  pass  of  the 
pre-compiler  over  the  source  code.  We  have  created  a  compiler  for  MPL  that  integrates 
both  the  actions  of  the  pre-compiler  and  the  compiler  into  one  stage.  In  this  report,  we 
present  MPL,  and  a  description  of  the  compiler  we  implemented.  Section  2  defines  the 
abstractions  used  in  Maruti.  In  Section  3,  the  syntax  of  the  constructs  unique  to  MPL  is 
defined.  The  details  of  the  implementation  of  the  compiler  are  given  in  Section  4.  Section  5 
describes  the  resource  information  that  is  recorded  during  compilation.  Conclusions  appear 
in  Section  6,  followed  by  an  Appendix  containing  a  sample  MPL  file,  and  the  resource 
information  recorded  for  that  file. 


2  Maruti  Abstractions 

An  MPL  application  is  broken  up  into  units  of  computation  called  elemental  units  (EUs). 
Execution  within  an  EU  is  sequential,  and  resource  and  timing  requirements  are  specified 
for  each  EU.  A  thread  is  a  sequential  unit  of  execution  that  may  consist  of  multiple  EU’s. 
MPL  allows  threads  of  execution  to  be  specified  by  the  programmer  through  several  of  the 
constructs  provided.  A  task  consists  of  a  single  address  space,  and  threads  that  execute  in 
that  address  space. .  Modules  contain  the  source  code  of  the  application  as  defined  by  the 
programmer.  An  application  may  consist  of  several  modules.  During  execution,  modules 
are  mapped  to  one  or  more  tasks. 

3  MPL  Constructs 

There  are  several  constructs  defined  in  MPL  that  are  not  a  part  of  standard  C.  These 
constructs  have  been  implemented  in  the  MPL  compiler. 
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3.1  Module  Name  Specification 

A  module  may  consist  of  one  or  more  source  files  written  in  MPL.  At  the  start  of  each  MPL 
file,  the  name  of  the  module  that  the  source  file  corresponds  to  must  be  indicated.  This  is 
given  by  the  following  syntax;  . 

aodule-aaine-spec  ::=  'module'  <module“Xxame> 

The  ttodule-aaae  may  be  any  valid  identifier  that  is  accepted  by  standard  C.  The  module 
name  specification  must  appear  at  the  beginning  of  the  source  file,  before  any  other  MPL 
code.  The  specification  is  not  compiled  into  any  executable  code.  It  is  simply  used  to 
indicate  the  module  that  the  functions  within  the  file  belong  to. 

3.2  Shared  Buffers 

A  shared  buffer  can  be  used  to  declare  memory  that  may  be  shared  by  several  tasks,  to 
permit  communication  between  the  tasks.  A  declaration  of  a  shared  buffer  requires  the  type 
be  defined  as  with  a  variable  declaration.  The  syntax  of  a  shared  declaration  is: 

shaxed-buller-decl  ::=  'shared'  <type-specitier>  <shared-buller-iiaine> . 

The  shared-bull er-naae  can  be  any  vahd  identifier,  and  the  type— spec ilier  can  be  any 
valid  type  for  a  variable.  A  shared  declaration  is  compiled  as  a  pointer  to  the  type  given  in 
the  declaration  of  the  shared  buffer,  rather  than  the  type  given. 

3.3  Region  Constructs 

The  are  two  constructs  used  to  allow  for  mutual  exclusion  within  an  application. 

3.3.1  Region  Statement 

The  region  statement  is  used  to  enforce  mutual  exclusion  globally  throughout  an  entire 
application,  and  is  given  by  the  syntax: 

regiou-srateneut  ::=  'region'  <region-naBe> 

{  jnpl-statciaenrs  >. 

The  Bpl-statenents  may  be  any  number  of  valid  MPL  statements.  These  statements 
make  up  a  critical  section. 

3.3.2  Local  Region  Statement 

The  locaLregion  statement  is  used  to  enforce  mutual  exclusion  within  a  task,  and  foDows 
the  same  syntZLX  oi  the  region  statement: 

loc£l“regioii“SXa*eme3i*c  :  'local^region '  <local-regioii— iiame> 

i  xnpl-sxaxeaents  >. 
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3.4  Channel  Declarations 


Channels  are  used  to  allow  for  message  passing  within  a  Maruti  application.  Each  channel 
declared  has  a  tj’pe  associated  with  it  given  by  a  valid  C  type-specifier.  This  type  indicates 
the  type  of  data  that  the  channel  w’ill  carry. 

Channels  may  be  declared  in  both  entry  and  service  functions,  which  will  be  defined 
below.  The  syntax  for  channel  declarations  is; 

chann el-declaration-list -opt  :  :=  *(  channel -declaration-list  3*. 

channel-declaration-list  ::=  channel-declaration  {  channel-declaration  >. 

channel-declaration  :  channel-type  channels 

channel-type  :  :=  'onf  |  'in'  I  'in-lirst'  1  *  in-last*. 

channels  channel  {  channel  }. 

channel  :  <channel-naae>  type-specilier . 


3.5  Entry  Functions 

An  entry  function  is  a  special  type  of  function  that  may  be  defined  in  an  MPL  source  file. 
Each  entry  function  corresponds  to  a  thread  within  the  application.  The  syntax  for  an  entry 
function  definition  is: 

entry-lunctiou  ::=  ’entry’  <entry-naae>  ’(’  entry-lnnct ion-body, 

entry-fnnction-body  ::=  ch.annel-declaration-list-opt  npl-f unction-body . 

3.6  Service  Functions 

Service  functions  are  another  type  of  special  function  supported  by  MPL.  A  service  function 
is  invoked  when  a  message  is  received  from  a  client.  Each  service  function  definition  requires 
an  in  channel  and  message  buffer  be  included  in  the  definition.  The  service  function  wiD 
be  executed  when  there  is  a  message  on  the  channel  given  in  the  definition.  The  definition 
of  a  service  function  is  similar  to  that  of  an  entry  function: 

service-function  ’service’  <service-naine> 

’(’  <in-channel-naaie>  ’:’  type_specifier  ’,’  <jBsg-ptr— naaie>  ’)’ 
service-function-body . 

service-function-body  cbannel-declaration-list-opt  npl-f unction-body . 

3.7  Communication  Function  Calls 

There  are  several  librarj’  functions  used  to  allow  for  message  passing  within  a  Maruti  ap¬ 
plication. 

3.7.1  Send  Calls 

Each  call  to  the  send  function  must  specify  an  outgoing  channel  for  the  message: 
void  send  (  channel  channel_naae ,  void  »niessage_ptr  ) ; 
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3.7.2  Receive  and  Optreceive  Calls 

Both  receive  calls,  and  optreceive  calls  must  be  associated  with  an  incoming  channel  (in, 
in^firsij  or  inJast): 

void  receive  (  channel  channel^naine ,  void  '►inessage^ptr  ); 
int  optreceive  (  channel  channel^nzune,  void  ♦inessage_ptr  ); 

A  call  to  receive  requires  that  there  be  a  message  on  the  incoming  channel.  Optreceive 
should  be  used  when  a  message  may  or  may  not  be  on  the  channel.  Optreceive  checks  for 
the  message,  aoid  returns  a  value  indicating  if  a  message  was  found. 

3.8  Initialization  Function 

Each  task  has  an  initialization  routine  that  is  executed  when  the  application  is  loaded.  This 
function  is  specified  by  the  user  with  the  following  name  and  arguments: 

int  mamti.aain  (int  airgc,  char  *^argv) 


4  Implementation 

We  started  with  version  2.5.8  of  the  Gnn  C  compiler.  By  modifying  the  source  code  for 
the  C  compiler,  we  have  created  a  compiler  for  applications  written  in  MPL.  In  addition 
to  what  the  standard  Gnu  C  compiler  does,  this  modified  compiler  hamdles  the  additional 
constructs  defined  in  MPL,  and  records  information  about  the  source  cocje  that  is  needed 
by  Maruti.  A  source  code  file  written  in  MPL  is  specified  with  an  Tnpl  extension. 

4.1  Modifications  to  GCC  File  Structure 

In  the  process  of  modif3'ing  the  compiler,  some  existing  files  were  modified.  In  addition, 
some  new  files  were  also  created.  The  source  code  for  version  2.5.8  of  GCC  allows  compilers 
to  be  created  for  several  different  languages:  C,  C-r+,  and  Objective  C.  The  GCC  compiler 
uses  different  executable  files  for  the  different  languages  that  it  compiles.  There  axe  separate 
files  for  C,  C-i — r,  and  Objective  C  (ccl,  cclplus,  cclobj).  The  GCC  driver,  gcc.c,  uses  the 
extension  of  the  source  file  specified  to  determine  the  appropriate  executable  (and  therefore 
language)  to  compile  the  source  file.  The  driver  then  executes  the  compiler,  passing  on  the 
appropriate  switches.  The  driver  was  modified  to  accept  input  files  with  an  mpl  extension. 
Cclmpl  is  the  new  executable  that  was  created  to  compile  MPL  source  files.  WTien  a  file 
with  an  mpl  extension  is  specified  as  a  source  file  to  be  compiled,  this  new  executable  file  is 
used.  W'^hen  an  MPL  file  is  compiled,  it  automatically  passes  on  the  switch  -Maruii-oxiiput, 
which  indicates  that  the  needed  output  should  be  recorded  io  a -file  with  an  eu  extension. 

The  executable  files  for  each  language  are  composed  of  many  object  files.  Some  of  these 
files  are  common  to  all  the  languages,  and  some  of  the  files  are  language- specific.  The 
language- specific  files  added  for  compiling  MPL  files  are  those  files  with  an  mpl-  prefix. 

Gperf  is  a  tool  used  to  generate  a  perfect  hash  function  for  a  set  of  words.  Gperf  is  used 
to  create  a  hash  function  for  the  reserved  words  for  each  language.  The  files  containing 
the  input  to  gperf  are  indicated  by  a  file  name  with  a  gperf  extension.  There  are  several 
different  *.gperf  files  containing  the  reserved  words  for  the  different  languages  recognized  by 
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the  compiler.  The  mpl-parse.gperf  file  contains  all  the  reserved  words  for  C,  in  addition  to 
those  added  for  MPL.  For  each  language,  the  output  from  running  gperf  is  then  incorporated 
into  the  *-lex.c  file.  This  output  includes  a  function  is.Teserved.word()  that  is  used  to  check 
if  a  token  is  a  reserved  word.  The  file  mpl-lex.c  is  basically  the  c-lex.c  file,  with  the  output 
of  running  gperf  on  mpl-parse.gperf  instead  of  c-parse.gperf. 

The  file  maruii.c  contains  the  routines  that  have  been  written  to  implement  MPL.  This 
file  is  linked  in  with  the  executable  for  all  of  the  languages,  to  prevent  undefined  symbol 
errors  from  occuring.  Calls  to  the  routines  contained  in  this  file  occur  in  both  the  language- 
specific,  and  the  common  files.  The  flag  marutLdump  is  set  in  main()  to  indicate  whether 
information  about  the  source  code  should  be  recorded  to  the  appropriate  output  file.  This 
flag  prevents  calls  to  the  routines  in  maruii.c  which  are  made  in  the  common  files  from 
occuring  for  the  languages  other  than  MPL.  The  files  containing  these  calls  are: 

•  calls,  c 

•  explow,  c 

•  expr.c 

•  function,  c 

•  toplev.c 

There  are  several  reasons  why  the  new  language-specific  files  have  to  be  created  for 
MPL.  The  files  mpl-lex.h  and  mpl-lex.c  needed  to  be  created  for  MPL  because  MPL  contains 
several  additional  reserved  words  not  present  in  C,  as  mentioned  earlier.  The  file  c-common.c 
relies  on  information  in  the  header  file  c-lex.h.  Since  MPL  uses  mpl-lex.h,  mpl-common.c 
includes  mpl-lex.h,  instead  of  c-lex.h.  Bison  is  a  tool  that  allows  a  programmer  to  define 
a  grammar  through  rules,  and  converts  them  into  a  C  program  that  will  parse  an  input 
file.  The  *-parse.y  files  are  the  bison  files  used  to  create  the  grammar  to  parse  a  source 
file.  Since  the  grammar  for  MPL  needed  to  be  modified  to  accept  the  additional  constructs, 
the  mpl-parse.y  file  was  created.  There  is  one  function  used  in  compiling  MPL  source  files 
that  is  defined  in  mpl-parse.y,  instead  of  maruti-c.  This  function  needed  to  access  the  static 
^•^Tiabies  declared  in  mpl-parse.y,  and  in  order  to  do  so,  the  function  definition  was  placed 
in  that  file.  Finally,  the  file  mpl-decl.c  was  created,  because  of  its  dependence  on  mpl-lex.h, 
and  also  to  allow  for  an  additional  type  specification  used  in  MPL. 

4.2  Compiling  MPL  Constructs 

MPL  extends  the  C  language  to  allow  for  various  constructs.  In  order  to  implement  these 
extensions,  the  grammar  used  to  recognize  C  in  GCC  had  to  be  extended.  The  following 
are  recognized  as  reserved  words  for  MPL,  in  addition  to  the  standard  reserved  words  for  C: 
shared,  region,  locaLregion,  module,  in,  out,  injirst,  inJasi,  entry,  service,  send,  receive, 
and  optreceive.  The  keywords  in  and  out  were  reserved  words  in  the  c-*  files,  because 
they  are  used  by  Objective  C,  but  in  MPL  they  are  used  as  channel  types.  In  addition  to 
the  new  reserved  words,  rules  were  added  and  modified  resulting  in  the  rules  in  mpl-parse.y. 

4.2.1  Module  Name  Specification 

A  rule  was  added  to  the  grammar  to  parse  the  module  name  specification  in  an  MPL  file. 
The  rule  for  a  whole  program  was  also  modified  to  include  this  module  statement.  This 
rule  expects  the  module  statement  to  appear  before  any  other  definitions.  Since  the  module 
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name  specification  does  not  result  in  any  executable  code,  the  only  action  taken  is  to  record 
the  module  name  given  by  the  programmer. 

4.2.2'  Shared  Buffers 

There  are  no  rules  added  to  the  grammar  for  a  shared  buffer  declaration.  When  a  variable 
declaration  is  parsed,  a  tree  is  created  that  keeps  track  of  all  the  specification  information 
given  for  that  declaration.  For  example,  typedef  and  extern  are  t'wo  of  the  possible  type 
specifications.  The  token  shared  is  recognized  as  a  type  specification,  just  as  typedef  and 
extern  are  recognized.  When  a  declaration  is  made,  these  specifications  are  processed  in 
the  function  grokdedaratorQ  in  mpl-decl.c.  When  a  shared  specification  is  encountered, 
the  declaration  is  converted  to  a  pointer  to  the  type  specified,  instead  of  just  the  type 
specified.  Other  than  this  conversion  to  a  pointer,  the  declaration  is  compiled  just  as  any 
other  declaration  v/ould  be  compiled  in  C. 

4.2.3  Region  Constructs 

The  region  constructs  are  considered  statements  in  MPL.  Several  rules  •were  added  to  parse 
these  constructs,  and  the  region  and  localjvgion  statements  were  added  as  options  for  a 
valid  statement  in  the  grammar  for  MPL. 

Both  region  and  locaLregion  statements  are  compiled  in  the  same  manner.  Each  region 
has  a  name,  and  a  body  which  is  the  code  within  the  critical  section.  In  order  to  protect 
these  critical  sections,  calls  are  made  to  the  Maruti  library  function  marutLeuQ.  When  a 
region  is  parsed,  the  compiler  generates  two  calls  to  marutLeuQ,  in  addition  to  the  code 
in  the  body  of  the  region.  The  first  call  is  generated  just  before  the  body,  and  the  second 
call  just  after.  These  calls  are  generated  through  functions  in  maruti.c.  The  functions  are 
based  on  the  actions  that  would  have  been  taken,  had  the  parser  actually  parsec  tne  .-^11  s 
to  marutLeuQ  in  the  source  file. 

4.2.4  Channel  Declarations 

The  rules  added  for  a  channel  declaration  allow  any  number  of  channels  to  be  declared  in 
either  an  entry  or  a  service  function.  Each  channel  declaration  requires  several  pieces  of 
information: 

•  ChanneUtype 

•  ChanneUname 

•  Type  specifier  indicating  the  type  of  data  that  channel  carries 

A  linked  list  of  declared  channels  is  maintained.  For  each  declared  channel  the  following 
information  is  saved: 

•  Channel-name 

•  Type  information 

1.  Size  in  bytes 

2.  String  encoding  the  type  of  the  data 

•  Channel-id 
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The  channel-id  is  a  unique  identification  number  assigned  to  each  declared  channel. 
Channel  declarations  do  not  add  to  the  compiled  code.  The  channels  are  not  allocated 
memory.  The  information  describing  each  channel  is  simple  stored  in  the  linked  list.  During 
compilation,  whenever  a  channel  is  referenced,  the  appropriate  information  is  obtained  from 
this  list. 


4.2.5  Entry  Functions 

Entry  function  definitions  are  compiled  differently  than  other  function  definitions.  An  entry 
function  would  appear  in  an  MPL  file  in  the  following  form: 

entry  <entry_nanie>  () 

<channel_declaration_list_opt> 

<mpl_function_body> 

> 


Where  entry jaame  is  an  identifier  that  is  the  name  of  the  entry  function,  the 
channeljieclarationJ.ist_opt  contains  any  channels  the  user  wants  to  define  for  that  func¬ 
tion,  and  mpUimctioiiJbody  is  any  function  body  that  would  be  accepted  as  a  definition  in 
a  standard  MPL  function.  Semantically  the  entry  function  is  equivalent  to  the  following 
MPL  code: 

_marati_eiitry_naDe  () 

vkileCl) 

naniti_eu() ; 
enrry.nane  ()  ; 

> 

> 

entry .name  C) 

un  ct  i  on_bod  y 

> 


An  entry  function  is  compiled  into  two  functions,  as  if  the  two  functions  ^ven  above  had 
been  part  of  the  source  file.  Essentially,  the  first  function  is  just  a  stub  function  that  calls 
maruiueu(),  then  calls  the  second  function  compiled.  As  with  generating  function  calls, 
the  routines  to  generate  the  code  for  entry  function  definitions  are  based  on  the  actions 
that  would  have  been  taken  had  the  parser  actually  parsed  the  code  for  the  two  separate 
functions. 

4.2.6  Service  Functions 

Service  functions  definitions  are  handled  very’  much  like  entry  function  definitions.  The 
syntax  of  a  service  function  differs  slightly  from  that  of  an  entry  function,  since  it  requires 
that  an  incoming  channel  and  a  message  buffer  be  defined: 
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service  <service_najDe>  (<izi_chaiiiiel_iiajne>  :  <type_specilier> ,  <insg_ptr^iiame> ) 
<cliaimel_declaration  list  opt> 

{ 

<iiipl_lunctioii_body> 

> 


Like  the  entry  functions,  service  functions  are  semantically  equivalent  to  two  functions, 
where  one  is  simply  a  stub  function  calling  the  second  function  that  is  generated: 

_inaruti.service_name  () 

type.specifier  .iaaruti_asg_ptr_iiaine  ; 

while (1) 

if  (  optreceive  (  .mamti.in  ,  id  ,  k  .inaruti_msg_ptr_naane,  size  )  ) 
service  .name  (t  maruti  msg.ptr  name  )  • 

} 

> 

> 

service_name  (insg.ptr_naine) 
type^specil  ier  "Tnsg^ptr^naine ; 

{ 

inpl.l  un  ct  i  on^body 


The  service jaame ,  chaimel jieclaratioiiJ.ist ,  and  mplJtmction^ody  axe  aJl  the  same 
as  described  previously  for  entry  functions.  In  addition,  service  functions  have  two  other 
items  specified  in  their  definitions.  The  first  is  a  channel.  Every  service  function  requires 
a  channel  be  specified.  This  channel  is  always  declared  as  an  in  channel  with  the  name 
in-cnanncljiane.  The  type  is  given  by  type^pecilier  as  if  it  had  been  declared  in  the 
diaaneljieclaration JList .  The  channel  is  used  to  invoke  the  service  function.  This  in 
channel  is  used  by  the  optreceive  in  the  stub  function  that  calls  the  function  containing  the 
service  function  body.  Vi^hen  a  message  is  received  on  this  channel,  the  service  function  is 
executed.  The  second  additional  item  is  a  message  buffer  used  by  the  service  function.  The 
name  of  this  message  buffer  is  given  by  msg  jtr_najBe ,  the  type  is  given  by  type_specilier. 
This  buffer  is  used  to  hold  the  message  received  from  the  client  that  invoked  the  service 
function,  and  is  passed  to  the  second  function  containing  the  body  of  the  service  function. 

4.2.7  Communication  Function  Calls 

There  were  three  library  functions  provided  for  message  passing  mentioned  previously:  send, 
recdve,  and  optreceive.  Function  calls  to  any  of  these  three  library  functions  are  handled 
differently  than  other  function  calls.  In  the  MPL  grammar,  send,  receive,  and  optreceive 
are  all  reserved  words.  The  MPL  syntax  for  all  of  these  calls  is  the  following: 

<fTiactioii-iiaiBe>  (<cnaiuiel-iiaiBe> ,  <paxaineter-2>) ; 
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Chaimel-naBie  shoxild  be  a  previously  declared  channel,  and  parameter-2  should  be  a 
pointer.  These  function  calls  must  be  compiled  differently,  since  these  are  not  the  actual 
parameters  used  when  the  call  is  generated.  In  the  case  of  a  call  to  send,  the  actual 
parameters  must  be  as  follows: 

send  (<clianjiel-id>,  <parameter-2> ,  <channel-sire>) ; 

In  the  case  of  a  call  to  either  receive,  or  optrecdve,  the  parameters  required  are; 

receive  1  optreceive  (cctannel— type^ ,  ^channel— id^ ,  ^parameter— 2^ ,  ^channel— size^) , 

The  channel-type  for  a  receive  or  optreceive  call  is  an  integer  generated  by  the  compiler 
that  wiD  indicate  an  in,  in.first,  or  inJast  channel. 

When  one  of  these  three  function  calls  are  encountered,  there  are  special  rules  in  the 
grammar  to  handle  it.  A  function  in  maraii.c  is  called  which  generates  the  appropriate 
parameters,  and  then  the  function  call  itself.  These  function  calls  are  generated  as  men¬ 
tioned  above  for  the  calls  to  marutLenQ.  The  channel-name  specified  by  the  user  is  used 
to  obtain  the  necessary  parameters.  Given  the  channel  name,  the  linked  list  of  channels  is 
searched  to  find  the  corresponding  channel,  then  the  channel-id  and  the  channel-size  are 
obtained  from  that  node  in  the  linked  list.  There  is  also  some  -type  checking  done  at  this 
stage.  The  compiler  verifies  that  only  an  outgoing  channel  is  specified  for  a  send  call,  or  an 
incoming  channel  for  the  receive  and  optreceive  calls.  The  compiler  also  checks  that  any 
channel  referenced  has  been  previously  defined. 

The  grammar  for  MPL  was  modified  so  that  a  call  to  any  of  the  communication  functions 
may  occur  any'where  that  a  primary  expression  occurs,  since  that  is  where  other  function 
calls  are  permitted  to  occur. 

4.2.8  Initialization  Function 

The  user-defined  function  maruii-Tnain()  is  compiled  as  an  ordinary  C  function. 

5  PEUG  File 

The  source  code  of  an  MPL  file  is  broken  up  into  elemental  units.  Each  elemental  unit 
identifies  the  resources  that  it  requires.  These  elemental  units  are  used  later  in  the  develop¬ 
ment  process  for  scheduling  the  application.  The  output  file  created  by  the  MPL  compiler 
creates  a  Partial  Elemental  Unit  Graph  (PEUG)  for  the  given  source  file.  The  name  of  this 
file  is  the  name  of  the  source  file,  with  the  mpl  extension  replaced  by  an  eu  extension. 
There  are  several  different  types  of  information  recorded  in  this  PEUG  file. 

5.1  Module  Name 

The  first  line  in  the  output  file  indicates  the  name  of  the  module,  and  will  appear  as: 
peug  <module-Baane> 

The  module-name  is  taken  directly  from  the  module  name  specification  given  in  the  MPL 
source  file. 
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5.2  File  Name 

The  second  line  in  the  source  file  indicates  the  name  of  the  target  file  that  is  created  by  the 
compiler,  where  file-aane  is  the  target: 

file  <file-ixa2ne> 


5.3  Shared  Buffers 

Each  time  a  shared  buffer  is  declared  its  name  and  type  information  is  recorded  to  the 
output  file: 

shared  <shared-b'ull er-iiaine>  ;  (type-description-string> ,  <type-size>) 

The  type-description-striug  and  type-size  of  a  shared  buffer  is  obtained  from  the 
type  specification,  and  is  represented  in  the  same  manner  as  the  type  ajid  size  for  a  chan¬ 
nel.  .A.lthough  the  shared  buffer  is  actually  a  pointer  to  the  type  it  is  declared  as,  the 
type-description-string  represents  the  object  being  pointed  to,  and  not  the  pointer  itself. 

5.4  Entry,  Service,  and  User  Function  Definitions 

In  MPL,  a  user  may  define  ordinary  functions  in  addition  to  the  entry  and  service  functions 
that  are  permitted  in  MPL.  For  each  entry,  service  or  ordinary  user-defined  function,  there 
is  an  entry  in  the  output  file.  This  entry  has  the  following  format: 

<fTiiictioii-type>  <iuiiction-aaDe> 


size  <stack-size> 

Function-type  can  be  either  function,  entry,  or  service,  indicating  which  type  of  function 
is  being  defined.  Function-name  is  the  declared  name  of  the  function  in  the  source  file. 
Stack-size  is  the  maximum  stack  size  needed  by  this  function.  This  stack-size  includes  the 
arguments  pushed  onto  the  stack  preceding  any  function  calls  occuring  within  the  function 
body.  There  will  also  be  other  information  concerning  the  body  of  the  function  that  will 
appear  between  the  iunction-name ,  and  the  stack-size.  The  entry  for  the  marutLmainQ 
function  will  be  the  same  as  those  for  other  user  defined  functions.  Entry  and  service 
functions  will  contain  some  additional  information  not  applicable  to  ordinary  functions 
that  will  be  described  below. 

5.4.1  Channels 

For  each  channel  that  is  declared,  a  description  of  the  channel  is  written  to  the  output  file. 
These  descriptions  w’ill  occur  right  after  the  statement  indicating  the  name  of  the  current 
function: 


<cliaimel-uype>  <iiame>  :  (<description-striag> ,  <size>) 
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The  chaimel-type  and  chzamel-aajne  will  be  the  type  and  name  specified  in  the  source 
file.  The  description-string  and  size  are  based  on  the  type  specification  in  the  channel 
declaration.  Channel  descriptions  will  occur  only  in  entry  and  service  functions.  A  service 
function  will  always  contain  at  least  one  channel  description,  since  the  syntax  of  a  service 
function  requires  a  channel  be  named  in  the  definition.  A  channel  description  will  also  be 
output  for  ever}'  send,  receive,  and  optreceive  call,  since  these  calls  require  a  channel  as  one 
of  their  parameters. 

5.4.2  Function  Calls 

Each  time  a  function  call  is  parsed,  there  will  be  a  line  in  the  output  file; 

calls  <fimction-ii.ame>  •Cin.cond]-  {in_loop3- 

This  line  indicates  where  a  function  call  occurs,  and  which  function  is  being  called.  The 
in_cond  and  inJ-oop  indicate  if  this  function  call  appears  within  a  conditional  statement  or 
within  a  loop.  These  labels  will  be  seen  only  if  their  respective  conditions  are  true. 

5.4.3  Communication  Function  Calls 

Anv  call  to  a  communication  function  is  recorded  similarly  to  other  function  calls.  There  is 
a  line  indicating  the  name  of  the  function,  as  shown  above  for  a  function  call.  In  addition, 
there  will  be  a  line  describing  the  channel  associated  with  that  communication  function  call. 
This  line  will  appear  just  as  the  line  for  the  channel  definition  described  above  appears. 

5.4.4  EU  Boundaries 

The  output  file  for  an  MPL  source  file  indicates  where  each  elemental  unit  (EU)  begins  by 
the  following: 

cu  <K>  <Tegion_list> 

The  K  indicates  an  EU  number.  Each  EU  within  a  source  file  has  a  unique  number. 
There  are  several  places  where  EU  boundaries  are  created; 

•  Start  of  a  fvnciion 

•  Start  of  a  region 

•  End  of  a  region 

•  Explicit  calls  to  marutLeuQ 

The  initial  EU  occuring  at  the  beginning  of  a  function  that  is  not  a  service  or  entry  function 
is  a  special  case.  This  is  always  labeled  as  “eu  0”  in  the  output  file,  and  does  not  represent 
an  actual  EU. 

Each  EU  may  also  be  followed  by  a  list  describing  one  or  more  regions.  This  list 
represents  the  regions  that  this  EU  occurs  within.  The  description  of  a  repon  appears  as; 

(regioE-nane  ins-ance  access  type) 

The  region-aame  is  just  that  given  by  the  user,  and  the  type  indicates  if  a  region  is  local 
(localj-egion  construct)  or  global  (region  construct).  The  access  indicates  if  the  access  is 
read  or  write.  The  instance  indicates  the  instance  of  this  region  within  the  source  file. 
Each  instance  for  a  region  within  a  source  file  is  unique. 
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6  Conclusions 


Basing  MPL  on  C  has  simplified  the  development  of  both  the  language  and  its  compiler. 
The  language  is  easy  to  learn  for  any  programmer  that  has  used  C  before,  since  there 
are  a  limited  number  of  additional  constructs  unique  to  MPL.  Using  the  GCC  C  source 
code  provided  an  existing  compiler,  rather  than  implementing  a  new  one.  The  source  code 
for  GCC  only  needed  to  be  modified  to  handle  some  additional  constructs,  and  produce 
some  additional  output.  This  made  the  implementation  fairly  simple.  However,  the  GCC 
C  compiler  also  provides  some  functionality  that  is  not  needed  by  MPL.  Much  of  this 
functionality  provided  is  not  even  permitted.  These  restrictions  are  not  enforced  by  the 
compiler,  but  should  be  detected  within  the  development  cycle. 

Prior  to  the  development  of  the  MPL  compiler  using  GCC,  compiling  an  MPL  source 
file  required  two  steps.  The  source  files  were  initially  passed  through  a  pre-compiler  to 
extract  the  available  resource  information  and  parse  the  MPL  constructs.  The  pre-compiler 
was  responsible  for  converting  the  MPL  code  into  valid  C  code,  which  was  then  compiled 
using  a  standard  C  compOer.  The  new  implementation  of  the  compiler  eliminates  some 
of  the  redundant  processing  that  is  done  when  the  pre-compiler  is  used.  The  information 
obtained  through  the  pre- compiler  already  existed  in  the  internal  structure  used  by  the  GCC 
compiler.  This  information  just  needed  to  be  recorded.  Instead  df  pdr&ing  source  code  files 
in  the  two  steps  independently,  the  functionality  of  the  pre-compiler  has  been  incorporated 
into  the  compiler  itself.  The  MPL  compiler  provides  a  single  tool  that  extracts  all  the 
a\*ailable  information  at  the  initial  stage  of  develpment. 

In  the  future,  a  version  of  MPL  may  be  implemented  that  is  based  on  the  Ada  pro¬ 
gramming  language.  GNAT  is  a  compDer  for  Ada  9X  that  is  being  developed  at  NYU. 
GNAT  depends  on  the  backend  of  the  GCC  compiler.  Using  the  source  code  for  GNAT, 
an  implementation  of  MPL  based  on  Ada  would  be  similar  to  the  current  implementation 
based  on  C. 


335 


Appendix 


A  MPL  File 

The  following  is  a  sample  of  MPL  source  code: 

module  timer; 

typedef  struct  { 
int  seconds; 
int  minutes; 
int  hours; 

}  time.type; 

shared  time.type  global .time; 

maruti.mainCargc,  argv) 
int  argc; 
char  »*argv; 

< 

global.time->seconds  =  0; 
global.tiBe->mimites  =  0; 
global.  time->liours  =  0; 

return  D; 

> 

entry  update.secondO 
out  dist  :  tioe.type; 

time.type  nsg; 

region  time.region  f 

giobal.time-’>seconds++ ; 
il  (global. tiae->seconds  ==  60) 
global.time->seconds  =  0; 
nsg  =  »globai.time; 

> 

send  (disp,  tnsg) ; 

> 

entry  update.ninuteO 
out  display  :  time.type; 

{ 

time.type  xnsg; 

region  time.region  f 

global.time->minutes++ ; 
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il  (glob2d_time->ffiiniites  ==  60) 
global_titte-'>miiiutes  =  0; 
ttsg  =  »global_titt€; 

> 

send  (display,  tosg) ; 

> 

entry  npdate_honr() 
out  display  :  tiae^type; 

{ 

tiae.type  msg; 

region  tiae^region  { 
global_tiine~>lioiirs++ ; 
il  ( global. tijne->hoiirs  ==  24) 
global,  t  in  e->lionrs  =  0; 
msg  =  *globaI.tine; 

> 

send  (display,  tosg) : 

} 

service  display_tine(inchan  :  tine  type,  time) 

{ 

print! (•'Current  Time:  %d  :  y,d  :  5Cd**,  tine->lioTirs ,  tine->minutes ,  tiBe*>seconds) ; 
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B  PEUG  File 

The  corresponding  PEUG  file  for  the  source  code  above  is 

peug  tiaer 
lile  timer. o 

shared  global.time  :  ($(iii) ,  12) 

Innction  maniti.jna.iii 
eii  0 
size  4 

entry  npciate. second 

ont  disp  :  ($(iii),  12) 
en  2 

en  3  (time.region  1  V  globaJL) 
calls  aamti.en 
en  4 

calls  mamti.en 
calls  send 

ont  disp  :  ($(iii),  12) 

size  32 

entry  npd ate. minute 

ont  display  :  ($(iii) ,  12) 
en  5 

en  6  (time.region  2  V  global) 
calls  maxnti.en 

en  7 

calls  maniti.en 
calls  send 

ont  display  :  ($(iii),  12) 

size  32 

entry  update. hour 

ont  display  :  ($(iii),  12) 
en  8 

en  9  (time.region  3  V  global) 
calls  maxnti.en 

en  10 

calls  maxnti.en 
calls  send 

ont  display  :  ($(iii),  12) 

size  32 

service  display.time 

in  inchan  :  ($(iii),  12) 
en  11 

calls  optreceive 
in  inchan  :  ($(iii) ,  12) 
calls  print! 

size  52 
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Chapter  1 
Introduction 


The  Maruti  Programming  Language  (MPL)  is  used  to  write  Marati  application  code. 
Currently,  MPL  is  based  on  the  ANSI  C  progranuning  language,  wiA  extensions  to 
support  modules,  real-time  constructs,  communications  primitives,  and  shared  memory. 

The  Maruti  Configuration  Language  (MCL)  is  used  to  specify  how  individual 
program  modules  are  to  be  connected  together  to  form  an  application  and  the  details  of  the 
hardware  platform  on  which  the  application  is  to  be  executed. 

1.1.  General  Program  Organization 

A  complete  Maruti  system  is  called  an  application.  Applications  can  be  large,  distributed 
systems  made  up  of  many  subsystems.  Each  application  is  defined  by  a  configuration  file, 
which  defines  all  the  subsystems  and  their  interactions.  The  following  entities  make  up  an 
application: 


Jobs  Jobs  are  the  active  entities  in  a  Maruti  application.  Jobs  are  specified  in  the 

configuration  file  with  timing  constraints,  including  the  job  period.  A  job  is  made 
up  of  multiple  entry  points,  which  are  the  threads  of  execution  that  will  be  run  for 
the  job. 

Modules  The  code  of  an  application  is  divided  into  modules.  Each  module  consists  of 
entry  points,  which  define  the  code  which  will  be  executed  as  part  of  a  job, 
services,  which  define  code  to  be  invoked  on  behalf  of  a  client  module,  and 
junctions,  which  are  called  from  entries  and  services. 

Tasks  At  run-time,  modules  map  to  tasks  (a  module  may  be  mapped  to  more  than  one 

task).  Each  task  consists  of  an  address  spac  and  threads  of  execution  for  the  entry 
points  and  services  of  the  module. 

Channels  Channels  are  the  communication  paths  for  Maruti  applications.  Each  channel  is 
a  one-way  connection  through  which  typed  messages  are  passed.  The  end  points 
are  defined  by  out  and  in  channel  specifiers,  and  are  coimected  as  specified  in  the 
application  configuration  file.  Each  end  point  is  associated  with  one  entry  or 
service,  and  its  message  type  and  channel  type  are  declared  within  the  entry  or 
service  header.  The  types  of  the  in  and  out  channel  specifiers  must  match. 

Regions  Regions  are  the  mechanism  for  mutual  exclusion  between  Maruti  threads:  only 
one  diread  can  enter  a  particular  region  at  a  time.  Two  types  of  regions  may  be 
specified:  global  regions  enforce  exclusion  for  the  entire  Maruti  application,  while 
local  regions  enforce  exclusion  only  within  a  single  task. 

Shared  buffers  Named  memory  buffers  can  be  shared  between  tasks.  The  buffer  is 
mapped  into  the  address  space  of  each  task  that  uses  that  buffer. 
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1.1.1.  Maruti  Programming  Language 

Rather  than  develop  completely  new  programming  languages,  we  have  taken  the  approach 
of  using  existing  languages  as  base  programming  languages  and  augmenting  them  with 
Maruti  primitives  needed  to  provide  real-time  support. 

In  the  current  version,  the  base  programming  language  used  is  ANSI  C.  MPL  adds 
modules,  shared  memory  blocks,  critical  regions,  typed  message  passing,  periodic 
junctions,  and  message-invoked  junctions  to  the  C  language.  To  make  analyzing  the 
resource  usage  of  programs  feasible,  certain  C  idioms  are  not  allowed  in  MPL;  in 
particular,  recursive  function  calls  are  not  allowed  nor  are  unbounded  loops  containing 
externally  visible  events,  such  as  message  passing  and  critical  region  transitions. 

•  The  code  of  an  application  is  divided  into  modules.  A  module  is  a  collection  of 
procedures,  functions,  and  local  data  structures.  A  module  forms  an  independently 
compiled  unit  and  may  be  connected  with  other  modules  to  form  a  complete  application. 
Each  module  may  have  an  initialization  junction  which  is  invoked  to  initialiyp;  the 
module  when  it  is  loaded  into  memory.  The  initialization  function  may  be  called  with 
arguments. 


•  Communication  primitives  send  and  receive  messages  on  one-way,  typed  channels. 
There  are  several  options  for  defining  channel  endpoints  that  specify  what  to  do  on 
buffer  overflow  or  when  no  message  is  in  the  chaimel.  The  connection  of  two  end- 
points  is  done  in  the  MCL  specification  for  the  application  -  Maruti  insures  that  end¬ 
points  are  of  the  same  t3^e  and  are  connected  properly  at  runtime. 

•  Periodic  functions  define  entry  points  for  execution  in  the  application.  The  MCL 
specification  for  the  application  will  determine  when  these  functions  execute. 

•  Message-invoked  functions,  called  services,  are  executed  whenever  messages  are 
received  on  a  channel. 

•  Shared  memory  blocks  can  be  declared  inside  modules  and  are  connected  together  as 
specified  in  the  MCL  specifications  for  the  application. 

•  Critical  Regions  are  used  to  safely  maintain  data  consistency  between  executing 
entities.  Maruti  ensures  that  no  two  entities  are  scheduled  to  execute  inside  their  critical 
regions  at  the  same  time. 

1.1.2.  Maruti  Configuration  Language 

MPL  Modules  are  brought  together  into  as  an  executable  application  by  a  specification  file 
written  in  the  Marati  Configuration  Language  (MCL).  The  MCL  specification  determines 
the  application’s  hard  real-time  constraints,  the  allocation  of  tasks,  threads,  and  shared 
memory  blocks,  and  all  message-passing  connections.  MCL  is  an  interpreted  C-like 
language  rather  than  a  declarative  lan^age,  allowing  the  instantiation  of  complicated 
subsystems  using  loops  and  subroutines  in  the  specification.  The  key  features  of  MCL 
include: 
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•  Tasks,  Threads,  and  Channel  Binding.  Each  module  may  be  instantiated  any 
number  of  times  to  generate  tasks.  The  threads  of  a  task  are  created  by  instantiating  the 
entries  and  services  of  the  corresponding  module.  An  entry  instantiation  also  indicates 
the  job  to  which  the  entry  belongs.  A  service  instantiation  belongs  to  the  job  of  its 
client.  The  instantiation  of  a  service  or  entry  requires  binding  the  input  and  output  ports 
to  a  channel.  A  channel  has  a  single  input  port  indicating  tiie  sender  and  one  or  more 
output  ports  indicating  the  receivers.  The  configuration  language  uses  channel  variables 
for  defining  the  channels.  The  definition  of  a  channel  also  includes  the  type  of 
communication  it  supports,  i.e.,  synchronous  or  asynchronous. 

•  Resources.  All  global  resources  (i.e.,  resources  which  are  visible  outside  a  module) 
are  specified  in  the  configuration  ffle,  along  with  the  access  restrictions  on  the  resource. 
The  configuration  language  allows  for  binding  of  resources  in  a  module  to  the  global 
resources.  Any  resources  used  by  a  module  which  are  not  mapped  to  a  global  resource 
are  considered  local  to  the  module. 

•  Timing  Requirements  and  Constraints.  These  are  used  to  specify  the  temporal 
requirements  and  constraints  of  the  program.  An  application  consists  of  a  set  of 
cooperating  jobs.  A  job  is  a  set  of  entries  (and  the  services  called  by  the  entries)  which 
closely  cooperate.  Associated  with  each  job  are  its  invocation  characteristics,  i.e., 
whether  it  is  periodic  or  aperiodic.  For  a  periodic  job,  its  period  and,  optionally,  the 
ready  time  and  deadline  within  the  period  are  specfiied.  The  constraints  of  a  job  apply 
to  aU  component  threads.  In  addition  to  constraints  on  jobs  and  threads,  ^er  level 
timing  constraints  may  be  specified  on  the  observable  actions.  An  observable  action 
may  be  specified  in  the  code  of  the  program.  For  any  observable  action,  a  ready  time 
and  a  deadline  may  be  specified.  These  are  relative  to  tire  job  arrival.  An  action  may  not 
start  executing  before  the  ready  time  and  must  finish  before  the  deadline.  Each  thread  is 
an  implicitly  observable  action,  and  hence  may  have  a  ready  time  and  a  deadline. 

Apart  from  the  ready  time  and  deadline  constraints,  programs  in  Maruti  can  also  specify 
relative  timing  constraints,  those  which  constrain  the  interval  between  two  events.  For  each 
action,  the  start  and  end  of  the  action  mark  the  observable  events.  A  relative  constraint  is 
used  to  constrain  the  temporal  separation  between  two  such  events.  It  may  be  a  relative 
deadline  constraint  which  specifies  the  upper  bound  on  time  between  two  events,  or  a  delay 
constraint  which  specifies  the  lower  bound  on  time  between  the  occurrence  of  the  two 
events.  The  interval  constraints  are  closer  to  the  event-based  real-time  specifications,  which 
constrain  the  minimum  and/or  maximum  distance  between  two  events  and  allow  for  a  rich 
expression  of  timing  constraints  for  real-time  programs. 

•  Replication  and  Fault  Tolerance.  At  the  application  level,  fault  tolerance  is  achieved 
by  creating  resilient  applications  by  replicating  part,  or  all,  of  the  application.  The 
configuration  language  eases  the  task  of  achieving  fault  tolerance,  by  allowing 
mechanisms  to  replicate  the  modules,  and  services,  thus  achieving  the  desired  amoxmt 
of  resiliency.  By  specifying  allocation  constraints,  a  programmer  can  ensure  that  the 
replicated  modules  are  executed  on  different  partitions. 
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Chapter  2 


Tutorial 


2.1.  Basic  Maruti  Program  Structure 

Maruti  applications  are  built  up  out  of  one  or  more  MPL  modules,  and  tied  together  with  a 
configurution  file  written  in  MCL.  Well  start  our  tutorial  with  an  explanation  of  a  very 
simple  application  consisting  of  one  module,  called  simple.mp.  Our  simple  application  will 
cont^  a  producer  thread  that  sends  out  integer  data,  and  a  consumer  thread,  which 
receives  integer  values  and  prints  them  out 

The  Module 

module  simple; 


int  data; 


maruti_main(int  argc,  char  **argv) 

{ 


if(argc  <  1)  { 

printfC'simple:  requires  an  integer  argumentW); 
return  1; 

} 


data  =  atoi(argv[0]); 
return  0; 

} 

This  first  part  of  the  module  will  be  similar  in  all  Maruti  modules.  The  module 
always  starts  with  the  module  name  declaration.  After  the  module  declaration,  the  MPL 
module  is  much  like  any  ANSI  C  pogram,  but  with  some  special  Maruti  definitions. 

Every  module  must  contain  a  function  named  maruti_main,  which  initializes  the 
module  at  load  time.  This  initialization  would  normally  include  things  like  device  probing 
or  painting  the  screen.  The  maruti_mam  function,  exactly  like  the  main  function  of  a  C 
pro^am,  takes  an  argument  count  and  list  as  its  parameters,  and  returns  an  error  code  to  its 
environment.  In  Maruti,  the  environment  is  the  system  loader,  and  any  non-zero  return 
results  in  a  load  failure,  in  which  case  the  application  wiU  not  run.  In  our  example 
maruti_m^  is  responsible  for  setting  the  initial  value  of  our  datum  from  the  environment,’ 
and  returning  a  failure  code  if  there  is  no  argument . 
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Periodic  Functions 


entry  producerQ 
out  och:  int; 

{ 

data++;  /*  produce  data  */ 

send(och,  &data); 

} 


The  producer  is  a  periodic  function,  or  Maruti  entry  point.  It  serves  as  the  top-level 
function  for  a  Maruti  thread  that  will  be  invoked  repeatedly,  with  a  period  specified  in  the 
MCL  config  file  (which  we  will  see  below). 

The  producer  outputs  its  data  on  a  Maruti  channel,  using  the  built  in  MPL  send 
function.  The  channel  och  is  declared  as  part  of  the  function  header  of  producer.  Maruti 
channels  are  declared  to  have  a  type,  usu^y  a  structure  but  in  this  case  a  simple  integer. 
AH  messages  sent  on  the  charmel  vvill  be  of  the  same  type. 

Note  that  there  is  no  open,  bind,  or  cormect  statement  needed  to  initiate 
communication  on  the  charmel.  The  connection  of  the  charmel  will  be  specified  in  the 
config  file,  and  initiated  automatically  by  the  runtime  system. 

Message-Invoked  Functions _ _ 


service  consumer(ich:  int,  msg) 

{ 

piintfC'consumer  got  %d\n",  *msg);  /*  consume  data  */ 

} 


The  consumer  is  a  message-invoked  function,  or  Marati  service.  It  serves  as  the 
top-level  function  for  a  Maruti  thread  that  is  invoked  whenever  there  is  a  message  delivered 
on  the  charmel  declared  in  the  function  header.  The  msg  parameter  is  the  name  of  the 
pointer  to  the  message  buffer  that  will  contain  the  delivered  message. 

Since  the  receipt  of  the  invoking  message  is  automatic  for  a  Maruti  service,  the  only 
thing  our  consumer  has  to  do  is  print  out  the  data  value  contained  in  the  message. 

This  completes  our  simple  module,  but  in  order  to  have  a  Mamti  application,  we 
must  have  a  config  file  that  tells  the  system  how  to  run  our  program. 

The  Config  File 

The  config  file  is  written  in  the  Maruti  Configuration  Language  (MCL),  an  interpreted  C- 
hke  language  with  constructs  that  allow  an  application  to  be  built  up  from  pieces  and 
interconnected.  The  MCL  processor,  called  the  integrator,  builds  a  program  graph  from 
the  specifications,  analyses  it  for  type  correcmess  and  completeness,  and  checks  for 
dependency  cycles.  Here  is  the  config  file,  simple,  cfg,  that  goes  with  our  application  : 
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application  simple  { 


job  j;  /*  declare  variables  */ 

task  si; 
channel  c; 

init  j:  period  1  s;  /*  specify  job  parameters  */ 

start  si:  simple(27);  /*  specify  task  parameters  */ 

o  si.producer  <c>  in  j;  /*  producer  thread  */ 

<c>  si-consumer  <>;  /*  consumer  thread  */ 

} 


The  vanables  m  MCL  correspond  to  the  objects  that  make  up  an  application,  such 
as  channels,  tasks,  and  jobs.  As  in  C,  these  variables  must  be  declared  before  they  are 
used. 

In  Maruti,  B.job  is  a  logical  collection  of  threads  that  run  with  the  same  period.  All 
entry  functions  in  Ae  application  must  be  put  in  some  job.  The  init  statement  sets  the 
period  for  a  particular  job.  In  our  case,  the  job  j  will  run  once  every  second. 

A  task  is  the  runtime  instantiation  of  an  MPL  module,  just  as  in  Unix  a  process  is 
the  runtime  image  of  a  program.  Many  tasks  may  be  executed  from  the  same  module,  each 
will  run  independently  in  the  Maruti  application.  The  MCL  start  command  instantiates  a 
task  from  a  module.  In  our  example,  we  instantiate  one  task  from  the  module  simple  and 
pass  it  the  initial  data  value  of  27. 

We  instantiate  the  threads  for  the  entry  and  service  functions  inside  a  particular 
with  particular  input  and  output  charmels.  In  our  example,  the  statement 


o  si.producer  <c>  in  j;  /*  producer  thread  */ 


instantiates  the  si.producerihre.dA  in  job  j  with  no  input  channels  and  one  output  channel, 
c.  Likewise,  the  statement 


<c>  siconsumer  <>;  /*  consumer  thread  */ 


instantiates  tiie  sLconsumer  thread  with  one  input  channel  c,  and  no  output  channels. 
Service  functions  are  not  put  in  a  job,  but  rather  inherit  the  scheduling  characteristics  of  the 
thread  that  is  sending  to  their  invoking  channel. 

The  integrator  checks  to  insure  that  the  use  of  producer  and  consumer  in  the  config 
file  match  the  declarations  in  the  program  module. 

Building  and  Running  the  Application 

We  can  build  the  simple  application  by  putting  simple.mpl  and  simple,  cfg  in  a  directory, 
and  running  the  mbuild  command  there: 
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%ls 

simple.cfg  simple.mpl 
%  mbuild 

mbuild;  extracting  module  iafo  from  MCL  file  ‘simple.cfg’ 
mbuild:  creating  obj  subdirectory  for  output  files, 
mbuild:  generating  obj/simple-build.mk 
mbuild:  running  make  -f  obj/simple-build.mk 


Mbuild  takes  care  of  running  the  MPL  compiler,  the  MCL  integrator,  as  well  as  the 
analysis  and  binding  programs  needed  to  build  the  runnable  Maruti  application.  By  default, 
mbuild  creates  both  a  stand-alone  binary  that  can  be  booted  on  die  bare  machine,  and  a 
Unix  binary  that  runs  in  virtual  real  time  from  within  the  Unix  development  environment. 
These  different  versions  of  the  runtime  system  are  called /Javors. 

We  can  try  out  the  simple  application  by  running  the  wc+xll  flavor  from  the 
command  hne: 


%  obj/simple.ux+xl  1 
<...  startup  messages  ...> 
consumer  got  28 
consumer  got  29 
consumer  got  30 
consumer  got  31 
consumer  got  32 
consumer  got  33 
consumer  got  34 
consumer  got  35 
consumer  got  36 
consumer  got  37 

application  quit _ 

The  application  boots  up  and  outputs  the  consumer  message  once  every  second. 
We  can  exit  the  application  by  typing  'q\ 

22,  Using  the  Graphics  Library 

Many  Maruti  programs  will  want  to  use  the  graphical  screen  as  a  monitor  for  an  embedded 
system,  producing  oscilloscope  or  bar-graph  style  displays,  or  for  animating  a  simulation 
or  demonstration.  Maruti  provides  a  console  graphics  Hbraiy  as  an  integral  part  of  the 
system  to  make  the  development  of  visually  oriented  applications  simpler.  Our  next 
example  application,  clock,  demonstrates  the  use  of  the  graphics  library  as  well  as  the  use 
of  m\titiple  jobs  to  t^e  advantage  of  Maruti's  scheduling  abilities. 
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Figure  2.1:  Dsiplay  of  clock  example  application 

The  clock  application  will  display  a  circular  clock  face  on  the  screen,  with  the  hour, 
minute,  and  second  hands  moving  as  independent  Maruti  threads  in  different  jobs.  The 
clock  screen  is  shown  in  Figure  2.1. 

We  will  now  go  through  the  clock.mpl  module  and  see  how  it  works. 


module  clock; 

#include  <maruti/mtime.h> 

#include  <maruti/console.h> 

#include  <math.h> 

#include  "clock.h" 

define  CENTER.X  (CONSOLE_WIDTH/2)  /*  useful  constants  */ 

#defmeCENTER_Y  (CONSOLE_HEIGHT/2) 

void  check_for_quitkey(void);  /*  subroutines  */ 

void  polar_point(int  pos,  int  radius,  int  *x,  int  *y); 
void  xor_triangle(int  pos,  int  apex_radius,int  color); 
void  xor_ray(int  pos,  int  color); 

int  sec_pos,  min_pos,  hour_pos;  /*  system  state  */ 
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The  first  part  of  the  module  is  much  like  any  other  ANSI-C  program,  with  # 
includes,  #  defines,  md  function  prototype  declarations  for  subroutines  to  be  used  later  in 
the  program.  Notice  the  two  Maruti  header  files  included:  <maruti/mtime.h>  contains 
declarations  related  to  Marati  time  management,  and  <maruti/console.h>  contains 
declarations  that  define  the  graphics  library  interface.  The  "clochh"  header,  which  well 
see  below,  will  contain  definitions  that  customize  the  look  of  the  clock  face. 


maruti_main() 

{ 

int  i,  xl,  yl,  x2,  y2,  color; 
charnum_str[4]; 
mtime  curtime; 
mdate  curdate; 

/*  initialize  screen  library,  paint  screen  black  */ 
cons_graphics_initO ; 

cons_fill_area(0, 0,  CONSOLE_WIDTH,  CONSOLE_HEIGHT,  BLACK); 

The  maruti_main  function  in  the  clock  application  draws  the  clock  face  display  and 
initializes  the  system  state  -  in  our  case,  the  positions  of  the  three  clock  hands.  Before 
drawing  on  the  screen,  the  ^plication  must  call  cons_graphics_init,  and  initialiTP.  the 
contents  of  the  screen.  TTie  call  to  cons_fill_area  does  this  by  filling  the  entire  screen  with 
the  color  BLACK. 


/*  draw  tick  marks  for  clock  face  */ 


for(i  =  0;  i  <  60;  i++)  { 

polar_point(i*POS_PER_TICMARK,  OUTER_RADIUS,  &xl,  &yl); 
polar_point(i*POS_PER_TICMARK,  INNER_RADIUS,  &x2,  &y2); 

if(i%5)  color  =  GRAY; 
else  color  =  WHITE; 


cons_draw_line(xl,  yl,  x2,  y2,  color); 

} 


The  step  in  initialization  is  to  draw  the  tick  marks  for  the  clock  face.  There  will  be 
sixty  tick  lines  drawn  arotmd  the  circle,  one  for  each  second.  Every  fifth  tick  mark  wiU  be 
WHITE  to  mark  the  hour  positions,  and  the  rest  will  be  GRAY.  The  lines  are  drawn  using 
the  cons_draw_line  library  routine,  which  draws  a  one-pixel-wide  line  between  two  points 
in  the  desired  color. 

The  location  of  the  endpoints  of  our  tick  marks  are  calculated  using  a  helper  routine, 
polar_point  (shown  below),  which  calculates  the  cartesian  coordinates  for  a  given  angle 
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and  radius.  We  conveniently  adopt  integer  angle  positions  starting  from  0  at  the  top 
clockwise  around  up  to  60*POS_PER_TICMARK  back  at  the  top  again. 


/*  draw  numerals  for  clock  face  */ 

for(i  =  1;  i  <=  12;  i++)  { 
sprintf(num_str,  "%d",  i); 

polar_point(i*5*POS_PER_TICMARK,  NUMBER_RADIUS,  &xl,  &yl); 
yl  -=  8;  xl  -=  strlen(num_str)*8  /  2;  /*  center  the  string  */ 
cons_print(xl,  yl,  num_str,  strlen(num_str),  YELLOW); 

} 


"nie  numerals  are  placed  on  the  clock  face  similarly  to  the  tick  marks.  The 
cons_print  graphics  library  function  places  text  on  the  screen  at  a  given  position  and  color. 


/*  initialize  the  hand  positions  to  current  time  */ 


maruti_get_current_time(&curtime); 
curdate  =  maruti_time_to_date(curtime); 


sec_pos  =  curdate.second  *  POS_PER_TICMARK; 
min_pos  =  curdate.minute  *  POS_PER_TICMARK  + 
curdate.second  *  POS_PER_TICMARK  /  60  ; 
hour_pos  =  (curdate.hour  %  12)  *  5  *  POS_PER_TICMARK  + 
curdate.minute  *  5  *  POS_PER_TICMARK  /  60; 


return  0; 

} 


The  final  part  of  the  initialization  is  the  calculation  of  the  initial  placement  of  the 
clock  hands.  The  maruti_get_curTent_timelsystem  call  returns  the  current  system  time, 
given  as  a  mtime  structure.  The  system  time  is  kept  just  as  in  Unix— as  the  number  of 
seconds  and  microseconds  since  the  Epoch  time,  defined  as  00:00  GMT  on  January  1 , 
1970.  The  maruti_time_to_date  library  routine  does  the  job  of  calculating  the  date  and 
time-of-day  from  an  mtime  value. 


entry  sec_hand() 

{ 

static  int  erase  =  0; 


if(erase)  xor_ray(sec_pos,  WHITE); 
else  erase  =  1; 
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sec_pos  =  (sec_pos  +  POS_PER_TICMARK)  %  NUM_POSmONS; 
xor_ray(sec_pos,  WHITE); 


check_for_quitkey  0 ; 

} 


The  periodic  function  sec_hand  will  be  run  once  per  second.  It  erases  the 
previously  placed  second-hand  ray,  calculates  the  new  position  and  draws  again  there.  The 
check_for_quitkey  subroutine  (shown  below)  will  poll  tiie  keyboard  and  exit  the 
application  if  a  key  is  pressed. 


entry  min_hand() 

{ 

static  int  erase  =  0; 

if(erase)  xor_triangle(min_pos,  MIN_RADIUS,  MIN_COLOR); 
else  erase  =  1; 

niin_pos  =  (rDin_pos  +  1)  %  NUM_POSrnONS; 
xor_triangle(min_pos,  MIN_RADIUS,  MIN_COLOR); 

} 

entry  hour_hand() 

{ 

static  int  erase  =  0; 

if(erase)  xor_triangle(hour_pos,  HOUR_RADIUS,HOUR_COLOR); 
else  erase  =  1; 

hour_pos  =  (hour_pos  +  1)  %  NUM_POSmONS; 
xor_triangle(hour_pos,  HOUR.RADIUS,  HOUR_COLOR); 

} 


The  inin_hand  and  hour_hand  periodic  functions  update  their  respective  hand  positions  by 
one  each  time  they  are  called.  The  second  hand  jumps  forward  one  second  each  time  it  is 
called,  but  the  minute  and  hour  hands  creep  forward  in  smaller  relative  increments  (rather 
than  jumping  forward  once  per  minute  or  hour,  which  would  not  look  right). 


void  polar_point(mt  pos,  int  radius,  int  *x,  int  *y) 

{ 

double  angle  =  (2.0*M_PI/NUM_POSrnONS)  *  (NUM_POSmONS-pos)  + 

M_PF2; 
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*x  =  CENTER_X  +  cos(aiigle)  *  radius; 
*y  =  CENTER_Y  -  sin(angle)  *  radius; 

} 


Fin^y  we  come  to  the  helper  functions.  The  polar_point  function  converts  from 
our  convenient  “positions"  to  real  angles  in  radians,  takmg  into  account  that  radians  start  at 
the  right  and  run  counter-clockwise,  whereas  our  positions  start  at  the  top  and  run 
clockwise.  Given  an  angle  in  radians  and  a  radius  from  the  center,  the  x  and  y  coordinates 
of  the  point  are  found  by  taking  the  cosine  and  sine  of  the  angle.  The  final  twist  is  that  in 
ca^ian  coordinates,  the  y  axis  points  up,  whereas  in  screen  coordinates  it  traditionally 
points  down,  so  the  y  coordinate  must  be  flipped  around. 


void  xor_ray(int  pos,  int  color) 

{ 

intx,  y; 

polar_point(pos,  SEC_RADIUS,  &x,  &y); 
cons_xor_line(CENTER_X,  CENTER_Y,  x,  y,  color); 

} 

void  xor_triangle(int  pos,  int  apex_radius,  int  color) 

{ 

int  xbl,  ybl,  xb2,  yb2,  xp,  yp; 
int  bp  1,  bp2; 

bpl  =  (pos  +  TRIANGLE_BASEL/2)  %  NUM.POSITIONS; 
bp2  =  (pos  -  TRIANGLE_BASEL/2)  %  NUM.POSITIONS; 

polar_point(bpl,  TRIANGLE_BASER,  &xbl,  &ybl); 
polar_point(bp2,  TRIANGLE_BASER,  &xb2,  &yb2); 
polar_point(pos,  apex_radius,  &xp,  &yp); 


cons_xorJine(xbl,  ybl,  xb2,  yb2,  color);  /*  base  of  triangle  */ 
cons_xor_line(xbl,  ybl,  xp,  yp,  color);  /*  first  arm*/ 
cons_xor_line(xb2,  yb2,  xp,  yp,  color);  /*  second  arm  */ 

} 


These  graphic  helper  routines  draw  the  line  for  the  second  hand  and  the  triangle  for 
the  minute  and  hour  hands.  The  cons_xor_line  routine  is  similar  to  cons_draw_line,  but 
exclusive-or's  its  pixels  with  the  screen  rather  than  just  painting  them.  The  xor  technique  is 
often  used  in  graphics  programming  because  it  allows  the  drawing  and  erasing  of  objects 
without  disturbing  the  backgroimd.  When  multiple  objects  overlap,  the  overlapping 
portions  may  become  a  strange  color  due  to  xor'ing,  but  you  are  guaranteed  that  when  the 
objects  are  erased  by  xor'ing  them  a  second  time  in  the  same  location,  whatever  color  was 
there  before  will  be  restored. 
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void  check_for_qmtkey(void) 

{ 

console_event_t  ev; 


if(cons_poll_event(&ev)  !=  0  &&  ev.device  =  EVENT_KEYBOARD 
&&  ev.keycode  =  KEY_SPACE) 
quitO; 

} 


The  final  helper  routine  polls  the  console  keyboard  for  events,  and  quits  the 
application  if  the  space  bar  is  pressed.  The  cons_poll_eventlsystem  call  reports  both  key 
press  and  key  release  events,  and  reports  a  scan-code  rather  than  an  ASCII  value.  This 
interface  is  rather  low  level,  but  allows  the  application  complete  access  to  the  up/down  state 
of  every  key  on  the  keyboard. 

This  completes  the  clock.mpl  module.  The  clocLcfg  config  file  follows: 


#include  "clock.h" 
application  clock  { 

job  secjob;  init  sec_job:  period  SEC_PERIOD  s;  /*  jobs  */ 
job  minjob;  initminjob:  period  MIN_PERIOD  s; 
job  hourjob;  init  hourjob:  period  HOUR_PERIOD  s; 

task  ct;  start  ct:  clock;  /*  task  */ 

<>  ct.sec_hand  <>  in  secjob;  /*  threads  */ 

o  ct.min_hand  <>  in  minjob; 

<>  ct.hour_hand  <>  in  hourjob; 

} 


Notice  that  the  config  file  can  include  header  files  just  like  the  MPL  module  can. 
This  allows  the  programmer  to  put  configuration-related  constants  in  one  header  and  use 
them  in  both  the  config  file  and  Ae  application  modules. 

The  clock  config  simply  creates  one  task,  plus  a  job  for  each  hand  of  the  clock. 
The  periods  are  defined  in  “clock.h”: 


MefmeINNER_RADIUS  235 

#define  OUTER.RADIUS  (INNER_RADIUS+15) 

#defineNUMBER_RADIUS  (OUTER_RADIUS+15) 

#defineTRIANGLE_BASER  30 
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#define  TRIANGLE_BASEL  50 


Aiefine  SEC_RADIUS  INNER_RADIUS 

#define  MIN_COLOR  YELLOW 
#defme  MIN_RADIUS  (SEC_RADIUS-50) 

#define  HOUR_COLOR  GREEN 
#define  HOUR.RADIUS  (MIN_RADIUS-50) 

#define  NUM.POSmONS  240 

#define  POS_PER_T[CMARK  (NUM_POSrnONS/60) 

#define  SEC_PERIOD  1  /*  jumps  1  tickmark/sec  */ 

#define  MIN_PERIOD  (60/POS_PER_TICMARK)  /*  creeps  1  tickmark/min  */ 
#define  HOUR_PERIOD  (3600/5/POS_PER_T[CMARK)  /*  creeps  5  tickmarks/hr  */ 

First,  a  number  of  constants  describing  the  visual  appearance  of  the  clock  face  are 
defined.  These  can  be  modified  to  taste. 

Second,  the  timing  characteristics  of  the  program  are  given.  The  key  parameter  is 
NUM_POSrnONS,  which  gives  the  number  of  positions  which  the  minute  and  hour 
hands  take  around  the  clock  face.  The  larger  this  number,  the  smaller  the  distance  the 
hands  move  each  time,  and  the  more  frequently  their  jobs  are  executed.  The  minute  hand 
must  move  through  all  60  tick  marks  once  every  hour,  and  the  hour  hand  5  tick  marks  each 
hour.  With  NUM_POSrnONS  set  to  240,  each  hand  moves  four  times  for  each  tick  mark 
on  the  face  of  the  clock,  which  works  out  to  one  move  every  15  seconds  for  the  minntp. 
hand,  and  one  move  every  180  seconds  for  the  hour  hand. 


r 
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Chapter  3 

MPL  /  C  References 


Maruti  Programming  Language  (MPL)  is  a  simple  extension  to  ANSI  C  to  support 
modules,  synchronization  and  communications  primitives,  and  shared  memory  variables. 
MPL  adds  some  restrictions  that  enable  analysis  of  the  CPU  and  memory  requirements  of 
the  program.  This  chapter  will  define  the  MPL-spedfic  features  that  differ  from  ANSI  C. 

3.1.  EBNF  Syntax  Notation 

In  this  manual,  syntax  is  given  in  Extended  Backus-Naur  Formalism  (EBNF).  In  this 
notation: 


•  literal  strings  are  quoted,  e.g.  'module'. 

•  other  terminal  symbols  are  bracketed,  e.g.  <module-name>. 

•  XIY  denotes  alternatives. 

•  {Z}  denotes  zero-or-more. 

•  {X}  denotes  zero-or-one. 

32.  MPL  Modules 

The  module  is  the  compilation  unit  in  MPL.  It  is  presented  to  the  MPL  compiler  as  one 
file,  but  may  contain  normal  C  #include  directives  so  that  the  parts  of  the  module  can  be 
kept  as  distinct  files.  The  MPL  compiler  generates  a  binary  object  file  for  the  module,  as 
well  as  a  partial  EU  graph  file  for  the  module,  which  contains  information  about  the 
module  needed  by  the  Maruti  analysis  tools. 

At  runtime,  each  MPL  module  is  mapped  to  a  Mamti  task,  which  logically  runs  in 
its  own  address  space.  Communication  between  tasks  is  through  channels  or  shared 
blocks.  Each  task  can  contain  multiple  threads  of  execution,  each  thread  corresponding  to 
an  entry  or  service  function  of  MPL. 

Each  module  starts  with  the  module  name  declaration: 


module_name_spec  ::=  'module'  <module-name>. 

33.  Module  Initialization 

When  the  task  corresponding  to  a  module  is  loaded,  the  Marati  runtime  system  executes  a 
non-real-time  initializer  function  provided  by  the  programmer.  The  initializer  is  a  normal  C 
function,  but  it  must  be  present  in  every  module.  It  is  declared  as: 

int  maruti_main(int  argc,  char  **argv); 
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The  job  of  this  function  is  to  initialize  the  state  of  the  task,  taking  any  parameter 
values  into  account  If  the  initializer  returns  0,  then  the  task  is  considered  successfully 
loaded,  otherwise  the  load  fads.  The  initializer  thread  can  not  send  or  receive  messages  on 
Maruti  channels. 

3.4.  Entry  Functions 

Maruti  entry  functions  occur  as  top-level  definitions  in  the  MPL  source  file,  similar  in 
syntax  to  normal  C  function  definitions. 


entry-function  ::=  'entry'  <entry-name>  (' ')'  entry-function-body, 
entry-function-body ::  =  channel-declaration-list  c-function-body. 

Entry  functions  serve  as  the  top-level  function  of  a  Maruti  thread  which  is  invoked 
repeatedly  with  a  period  as  specified  externally,  in  the  MQL  configuration.  Multiple 
instances  of  the  entty  thread  can  be  active  in  a  single  task  at  runtime,  so  care  must  be  taken 
to  protect  accesses  to  shared  data  with  a  region  or  local_region  construct. 

35.  Service  Functions 

Maruti  service  functions  also  occur  as  top-level  definitions  in  the  MPL  source  file. 

service-function  ::=  'service'  <service-name> 

'('<in-channel-name>':'<type_specifier>','<msg-ptr-name>')' 

service-function-body. 

service-function-body  ::  =  channel-declaration-list  c-function-body. 

Services  are  declared  with  the  initiating  channel  and  pointer  to  a  message  buffer.  A 
service  thread  is  invoked  whenever  a  message  on  the  channel  has  been  received,  thus  it 
inherits  the  scheduling  characteristics  of  the  sender  to  the  channel.  Multiple  instances  of  the 
service  may  be  active  in  a  single  task  at  the  same  time,  servicing  messages  from  different 
senders,  so  care  must  be  taken  to  protect  accesses  to  shared  data  with  a  region  or 
local_region  construct 

The  receipt  of  the  invoking  message  into  private  storage  is  automatic,  and  tire 
service  function  is  called  with  a  pointer  to  Sie  message  buffer.  Fore  example,  given  the 
service  declaration: 


service  consumer(inch:  ch_type,  msg)  {  ...  } 


The  service  is  actually  invoked  as  if  it  were  a  C  function  declared: 
void  consiuner(ch_type  *msg)  (...) 
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3.6.  MPL  Channels 


In  Maruti,  channels  are  one-way,  typed,  communications  paths  whose  traffic  patterns  are 
analyzed  and  scheduled  by  the  system.  The  channel  end-points  are  declared  as  part  of  the 
entry  or  service  functions  which  take  part  in  the  communication.  The  endpoints  are 
connected  in  the  MCL  configuration  for  the  application. 


The  syntax  of  MPL  channels  is  similar  to  a  C  variable  declaration: 


channel-declaration-list  ::=  [  channel-decl  {  channel-decl }]. 
channel-declaration  ::=  channel-type  channel  {  channel } 

channel-type  ::=  'out'  I  'in'  I  'in_first'  I  'in_last'. 
channel  ::=  <channel-name> ':'  type-specifier. 

A  channel  endpoint  declaration  will  normally  be  either  an  out  endpoint  or  an  in 
endpoint,  used  in  the  sending  thread  and  receiving  thread,  respectively.  There  are  two 
special  variants  of  in  endpoint,  in_first  and  in_last,  which  denote  asynchronous  channels  in 
which  the  conununications  not  be  scheduled,  and  the  input  buffers  are  allowed  to 
overflow.  For  in_first  channels,  the  first  messages  received  will  be  retained  and  the  rest 
dropped,  for  in_last  the  most  recent  messages  wiQ  be  retained  and  older  messages 
overwritten. 


3.7.  Communication  Primitives 

The  message  passing  primitives  appear  as  normal  C  function  calls,  but  they  are  built  in 
primitives  of  the  MPL  compiler,  and  their  use  is  recorded  so  that  the  communications  on 
the  chaimel  can  be  analyzed. 

The  three  primitives  each  take  a  chaimel  name  and  a  pointer  to  a  message  buffer. 
Their  declarations  would  look  something  like  this: 

void  send  (out  ch_name,  ch_type*  message_ptr); 
void  receive  (in  ch_name,  ch_type*  message_ptr); 
int  optreceive(in  chjname,  ch_type*  message_ptr); 

There  are  two  variants  of  the  receive  primitive.  A  normal  receive  is  used  in  most  cases,  and 
it  raises  an  exception  if  there  is  no  message  delivered  at  the  time  it  is  executed.  Normally 
the  Maruti  scheduler  will  arrange  things  so  that  this  is  never  happens.  When  messages 
might  not  be  present  when  the  receiver  is  run,  as  when  threads  are  communicating 
asynchronously  with  in_first  and  in_last  channels,  or  when  the  sender  sometimes  will  not 
send  the  message  due  to  run  time  conditions,  an  optreceive  must  be  used.  The  optreceive 
variant  checks  if  a  message  is  present,  and  receives  it  if  so.  It  returns  1  if  a  message  was 
delivered,  or  0  if  no  message  was  delivered. 
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3^.  Critical  Regions 

Mutual  exclusion  is  often  necessary  to  prevent  the  corruption  of  data  structures  modified 
and  accessed  by  concurrent  threads.  In  Maruti,  the  region  statement  delineates  a  critical 
region. 

region-statement  ::=  ('regionTlocal_region')  <region-name>  c-statement. 

The  local  region  variant  is  used  within  a  task,  usually  to  serialize  multiple  thread  access  to 
data  structures.  The  region  variant  is  global  to  the  application,  and  is  used  to  serialize 
access  to  shared  buffers  and  other  application-defined  resources,  as  specified  in  the  MGL 
configuration  for  the  application. 

3.9.  Shared  Buffers 

Finally,  MPL  adds  shared  buffers  to  the  C  language.  Shared  buffers  declarations  are 
similar  in  syntax  to  typedef  declarations: 

shared-buffer-decl  ::=  'shared'  <type-specifier>  <shared-buffer-name>. 

The  shared  buffer  declaration  is  effectively  a  pointer  declaration.  For 
example: 


shared  some_type  shared_buffer; 


is  treated  as  if  it  were  a  declaration  of  the  form: 


some_type  *shared_buffer  =  &some_buffer; 

The  MPL  specification  for  the  application  determines  which  tasks  share  each  shared 
memoty  area.  The  runtime  system  takes  care  of  allocating  memory  for  the  shared  buffers, 
and  initiahzing  the  buffer  pointers.  The  MPL  program  can  at  all  times  dereference  the 
pointer. 

3.10.  Restrictions  to  ANSI  C  in  MPL 

The  Maruti  real-time  scheduling  methodology  requires  that  the  tools  be  able  to  analyze  the 
control  flow  and  stack  usage  of  the  MPL  programs,  and  that  synchronization  points  be  well 
known.  Thus  the  following  restrictions  to  ANSI  C  must  be  followed  by  the  MPL 
programmer: 

•  No  receive  primitives  are  allowed  within  either  loops  or  conditionals. 

•  No  region  construct  are  allowed  within  either  loops  or  conditionals. 

•  No  send  primitive  within  a  loop. 

•  Direct  or  indirect  recursion  is  not  allowed. 

•  Function  calls  via  function  pointers  should  not  be  used. 
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Chapter  4 

MCL  Reference 

Maruti  Configuration  Language  (MCL)  is  used  to  specify  how  individual  program  modules 
are  to  be  connected  together  to  form  an  application  and  to  specify  the  details  of  the 
hardware  platform  on  wMch  the  application  is  to  be  executed. 

MCL  is  an  interpreted  C-like  language.  The  MCL  processor  is  called  the 
integrator.  The  integrator  interprets  the  instructions  of  the  MCL  program,  instantiating  and 
connecting  the  components  of  the  application,  checking  for  type  correctness  as  it  goes,  and 
outputs  the  apptication  graph  and  all  allocation  and  scheduling  constraints  for  further 
processing  by  other  Maruti  tools. 

4.1.  Top-level  Declarations 

T  -iVp.  a  C  program,  an  MCL  configuration  file  is  composed  of  a  number  of  top-level 
declarations.  The  C  preprocessor  is  invoked  first,  so  the  configuration  file  may  contain 
#include  and  #define  directives  to  make  the  configuration  very  customizable. 

configuration :  :=  { toplevel-declaration } . 
toplevel-declaration  ::=  variable-declaration  I  system-declaration 
block-declaration  I  application-declaration. 

The  declarations  may  occur  in  any  order— they  do  not  have  to  be  defined  before  used.  The 
four  types  of  top  level  declaration  are  describe  in  more  detail  below. 

4.1.1.  The  Application  Declaration 

application-declaration  ::=  'application'  <application-name> 

'{'  {instruction} '}'. 

Like  the  main  function  of  C,  the  application  declaration  is  where  the  integrator  will  begin 
execution  of  the  configuration  directives.  Only  one  application  may  be  declared  in  the 
configuration. 


4.1.2.  The  System  Declaration 

system-declaration  ::=  'system'  <system-name> 

'{'  {node-declaration} '}'. 

node-declaration  ::=  'node'  <variable-name>  ['with'  attributes], 
attributes  ::=  attribute  {','  attribute}. 

attribute  ::=  <symbol>  ['='  <integer>  I  '='  <symbol>  I  '='  <string>]. 

Like  the  application  declaration,  the  system  declaration  can  occur  at  most  once  in  a 
configuration.  It  is  not  needed  for  single-node  operation.  The  system  declaration  names 
the  nodes  that  an  application  will  run  on,  and  specifies  attributes  for  them.  For  example: 
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system  hdw  { 

node  northstar  with  address  = "{ 0x00, 0x60, 0x8c,0xbl,0xfb,0xc6}",  master; 
node  raduga  with  address  =  "{0x00,0x60,0x8c,0xbl,0xf6,0x67}"; 


The  integrator  does  not  assign  any  meaning  to  the  attributes  declared  for  the  nodes, 
it  just  passes  the  infonnation  along.  However,  the  Maruti  binder  does  require  the 
addresslattribute  for  each  node,  which  specifies  the  node's  ethemet  address,  and  the  master 
attribute  on  only  one  node,  to  specify  which  node  will  be  the  boot  and  time  master.  The 
MaratiA^irtual  environment  further  requires  that  the  node  <variable-name>  correspond  to 
the  hostname  of  the  node  in  the  testbed  environment. 

4.1,3.  Block  Declarations 

block-declaration  ::=  'block'  <block-name>  '('  [block-parameters] ')' 
block-parameter-channels 
{instruction} '}'. 

A  block  is  something  like  a  function  in  C.  When  a  block  is  declared,  it  may  be 
called  by  any  other  block,  except  that  no  self-recursion  is  allowed.  A  block  can  not  be 
declar^  inside  another  block.  A  block  is  called  by  giving  its  name  and  parameters.  There 
are  2  kinds  of  parameters:  classical  parameters  and  channel  parameters. 

block-parameters  parameter  { ','  parameter  }. 
parameter  ::=  ['var']  <parameter-name>  ['[]']  [':'  type]. 

Classical  parameters  are  like  function  parameters  in  C  or  Pascal.  They  can  be 
passed  by  value  or  by  variable  (var  for  variable  passing).  Arrays  may  also  be  be  given  as 
var  parameters.  The  type  of  the  parameter  must  be  ^ven  for  the  first  parameter.  It  may  be 
omitted  for  following  parameters:  the  integrator  will  assume  that  the  parameter  with  no 
given  type  has  the  same  type  as  the  previous  parameter. 

block-parameter-channels  ::=  {  ('inTout')  channel-names  }. 
channel-names  ::=  channel  { ','  channel }. 
channel  ::=  <channel-name>  [ '['  <integer>  ']'  ]. 

The  channel  parameters  decribe  the  inputs  and  ouq)uts  of  the  block.  The  in  and  out 
keywords  do  not  have  exactly  the  same  meaning  has  in  MPL:  they  only  show  which 
chaimels  are  connected  at  the  left  and  which  are  connected  at  the  right  of  the  block  call  (see 
connection  below).  The  communication  type  of  the  channel  in_first,  injast,  or 
synchronous)  and  the  type  of  the  messages  on  the  channel  are  determined  by  the 
connections  of  the  channels  to  the  tasks. 

WTien  there  is  an  array  of  channel  parameters,  the  connections  will  occur  in 
ascending  order.  For  example: 

block  foo() 
inch[3]; 

{  ...  ) 
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application  bar  { 
channel  a[3]; 

<a[0..2]>  foo()  <>;  /*  a[0]->ch[0],  a[l]->ch[2],  a[2]->ch[2]  */ 


} 


4.1.4.  V ariable  Declarations 

variable-declaration  ::=  type  variable-names 

type  ::=  'float'  I  'int'  I  'string'  I  'time'  I  'channel' 

1  'task'  I  'job'  I  'node'  I  'shared'  I  'region'. 

variable-names  ::=  variable  {  ','  variable  }. 

variable  ::=  <variable-name>  ['['  <integer>  ']']  ['['  <mteger>  ']']. 

Variables  may  be  declared  globally  at  the  top-level,  or  locally  in  a  block.  Global 
variables  can  be  accessed  in  all  blocks,  while  local  variables  can  only  be  accessed  in  the 
block  where  they  are  declared.  A  local  variable  (or  a  parameter)  may  be  declared  with  the 
same  name  as  a  global  variable.  In  this  case  only  the  local  variable  (or  the  parameter)  can 
be  accessed  in  the  block. 

The  order  of  the  variable  declarations  does  not  matter.  For  example: 


block  foo() 

{ 

i  =  4s  +  5mn;  /*  correct  */ 
timei; 

} 


Arrays  may  be  declared.  As  in  C,  the  array  indides  are  numbered  from  0  to  size-of- 
array  less  1.  Arrays  of  1  or  2  dimensions  are  accepted.  For  example: 


-  block  foo() 

{ 

string  s[10]; 

s[5]  =  "a  string";  /*  correct  */ 

s[0]  =  s[5]  + "  foo";  /*  correct*/ 

s[10]  =  "";  /*  incorrect:  out  of  array  limits  */ 

} 


42.  Instructions 

The  MCL  integrator  interprets  a  number  of  instructions  that  express  the  way  an  application 
is  to  be  built  up  from  components.  The  different  instructions  are  explained  below. 

instraction  ::=  variable-declaration 
I  task-initialization 
I  job-initialization 
I  connect-declaration 
I  link-intruction 
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I  allocation-instruction 
I  expression 
I  print-instruction 
I  compound-instruction 
I {instruction} 

4.2.1.  Compound  Instructions 

compound-instruction 

'if  '(’  test-expression ')'  intruction 
I  'if  '('  test-expression  ’)'  intruction  'else'  instruction 
I  'do'  instruction  'while' '('  test-expression  ')' ';' 

I  'while' '('  test-expression  ')'  instruction 
I  'for' '('  expression  ';'  test-expression  ';'  expression  ')' 
instruction. 

test-expression  ::=  expression. 

^e  meaning  of  these  constructs  is  the  same  as  in  the  C  language.  The  test- 
expression  should  evaluate  to  an  integer,  where  0  means  false,  and  all  other  values  mean 
true. 

4.2.2.  Tasks 


task-initialization  ::=  'start'  names  ':'  <module-name>  [module-parms] 

[instantiation]  [task-allocation] ';' 

module-parms  ::=  '('  [module-parameter-list] ')'. 
module-parameter-list  ::=  expression  {','  expression}. 

instantiation  ::=  'with'  <symbol>  '='  constant 
{','  <symbol>  '='  constant }. 

task-allocation  ::=  'on'  expression. 

A  variable  of  type  task  must  be  initialized  before  it  can  be  used.  This  initialization  consists 
of  giving  the  name  of  a  module:  the  task  will  be  an  instantiation  of  this  module.  Module 
parameters  may  be  given:  after  evaluation  they  will  be  given  to  the  initializer  thread  of  the 
module.  The  initializations  during  the  loading  of  an  application  will  take  place  in  exactly 
the  same  order  as  thay  are  found  by  the  intergrator  during  the  execution  of  the 
configuration. 

An  the  shared  buffers  and  the  global  regions  of  the  module  must  be  instantiated 
using  the  with  clause:  the  corresponding  shared  or  region  variables  must  be  given. 

The  on  clause  may  be  used  to  force  afiocation  of  the  task  on  a  particular  node. 

4.2.3.  Job  Initialization 

job-initialization  ::=  'init'  names  ':'  timing-job 
timing-job  ::=  {  'period'  expression  }. 

A  variable  of  type  job  must  be  initialized  before  it  can  be  used.  The  job  wiU  refer  to  a 
collection  of  threads  with  the  same  period. 
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4.2.4.  Connections 

connect-declaration  ::=  chan-list  connect-name  chan-list 
[in-job]  {timing-service}  [task-aUoc]';'- 
chan-list  ::=  '<'  [names] 

connect-name  :;=  <task-name>  ['['expression']']  <routine-name> 

I  <block-name>  '('  [expression  {','  expression}] ')'. 
in-job  ::=  'in'  constant. 

timing-service  ::=  ( 'ready'  expression  I  'deadline'  expression ). 

There  are  two  types  of  connections:  routine  connections  and  block  connections.  In  both 
cases  the  inputs  are  connected  (or  mapped)  to  the  channels  declared  at  the  left  of  the 
connection  and  the  outputs  at  the  right.  The  number  of  input  (or  ouput)  channels  must  be 
the  same  as  in  the  definition  of  the  routine  (or  the  block).  The  mapping  is  done  following 
the  order  of  this  definition. 

In  a  routine  connection  the  inputs  and  ouputs  of  an  entry  or  a  service  of  a  task  are 
connected  to  channels.  This  connection  creates  a  new  instance  of  a  service  if  the  routine 
was  a  service,  otherwise  it  creates  the  only  instance  of  an  entry.  An  entry  can  not  be 
connected  many  times. 

For  an  entry  connection  a  job  name  must  be  given,  the  entry  wiU  be  a  part  of  this 
job.  For  a  service,  a  job  can  not  be  declared:  the  job  of  the  service  is  implicitly  given  by  the 
connection:  the  first  input  channel  of  a  service  is  the  triggering  channel  of  the  service.  The 
job  of  the  service  is  the  same  as  the  job  of  the  origine  of  Ae  triggering  chaimel. 

A  timing  characterization  may  only  be  given  to  a  routine  connection. 

In  a  block  connection  the  input  and  output  channels  of  the  block  are  mapped  to  the 
given  channels.  A  mapping  is  also  done  for  all  the  block  parameters,  following  Ae  order  in 
tile  block  definition.  The  number  of  parameters  must  be  the  same  as  in  this  definition,  and 
all  the  types  must  be  coherent 

4.2.5.  Allocation  Instructions 

allocation-instruction  ::=  'separate' '('  names  ’)' ';' 

I  'together' '('  names  ')' 

A  separate  instruction  is  a  command  to  the  allocator  to  keep  the  tasks  on  different  nodes  in 
the  final  system.  A  together  instruction  specifies  that  aU  tasks  must  be  allocated  to  the 
samenode. 


4.2.6.  Link  Instruction 

link-intruction  ::=  link'  expression  'to'  expression 

In  a  few  cases  the  coimections  are  not  sufficient  to  describe  a  communication  graph  with 
the  structure  of  the  blocks.  In  these  cases  a  link  instruction  may  be  used. 

A  link  between  two  channels  means  that  the  two  channels  are  the  same. 

Example:  if  we  want  to  connect  directly  an  input  and  an  output  channel  of  a  block  a 
link  must  be  used. 
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block  fooQ 
inin_channel; 
out  out_chaimel; 

{ 

link  in_channel  to  out_channel; 

} 


4.2.7.  Print  Instruction 


print-instraction  ::=  'print  ('  expression  expression} 

The  print  instruction  outputs  messages  to  the  standard  output  during  integration.  This 
instruction  can  be  used  for  the  debugging  of  a  configuration  file.  Any  string,  number,  or 
time  may  be  printed.  A  newline  is  added  at  the  end. 

43.  Expressions 

Expressions  in  MCL  are  very  similar  to  C  expressions: 


expression  ::=  expression  '='  expression 
I  expression  'll'  expression 
I  expression  '&&'  expression 
I  expression  '='  expression 
I  expression  '!='  expression 
I  expression  '<'  expression 
I  expression  '>'  expression 
I  expression  '<='  expression 
I  expression  '>='  expression 
I  expression  ('d'l'h'l'mn'l's'rms'l'us') 

I  expression '+'  expression 
I  expression '-'  expression 
I  expression  '*'  expression 
I  expression  ’/'  expression 
I  expression expression 
I  '!'  expression 
I  '('  expression  ')' 

I  expression  '++' 

I  expression '-' 

I  constant 


com 


stant  ::=  <symbol>  [ '['  expression  ']'  ]  [ '['  expression  ']'  ] 

I  <symbol>  '['  expression  '..'  expression  ']'  [ '['  expression  ']'  ] 
<symbol>  '['  expression  ’]'  '['  expression  '..'  expression  ']' 
<integer> 

<double> 

<string>. 
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In  addition  to  the  usual  C  expressions,  MCL  supports  time  unit  expressions,  for  example, 
‘3  s  +  500  ms’  is  a  time  expression  that  evaluates  to  3.5  seconds. 

Also,  MCL  supports  array  range  notation  as  a  shorthand  for  lists.  For  example,  the 
expression  ‘c[2..4]’  is  shorthand  for  c[2],  c[3],  c[4]’.  This  notation  is  most  often  used  for 
passing  arrays  of  channel  values  to  blocks  or  in  connection  instructions. 
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CHAPTER  5 

Maruti  Runtime  System  Reference 

The  Marati  runtime  system  is  bound  together  with  the  application  binary  files  by  the  mbind 
utility.  Only  those  parts  of  the  runtime  needed  by  the  application  are  linked  in.  There  are 
several  versions  of  the  runtime  system  available  depending  on  the  environment  in  which  the 
application  will  be  run.  For  example,  there  are  two  different  versions  of  the  core  library:  a 
stand-alone  version  that  can  boot  directly  on  bare  hardware,  and  a  Unix  version  that  runs  as 
a  user-level  process  under  Unix,  providing  virtual-time  execution  and  access  to  debugging 
tools.  ^ 

The  set  of  library  versions  that  an  application  links  with  ate  called  flavors.  Flavors 
are  specified  by  the  programmer  as  strings  of  library  names  separated  by  a  ‘+’,  for 
example,  ‘ux+xll’. 

5.1.  Core  Library  Reference 


#include  <maruti/maruti-core.h> 

The  Marati  core  library  implements  the  scheduling,  thread  and  memory  management,  and 
network  communication  subsystem.  It  provides  primitives  for  applications  to  send  and 
receive  messages,  insert  preemption  points,  manipulate  the  schedule  (via  calendars),  and  do 
time  and  date  calculations.  There  are  currently  two  flavors  of  the  core  library: 

•  sa  -  The  Marati/Standalone  core  library.  Applications  hnked  with  this  flavor  can  be 
booted  directly  (by  the  NetBSD  boot  blocks).  It  includes  the  distributed  operation 
support,  based  on  the  3Com  3c507  Etherlink/16  adapter. 

•  ux  -  The  MarutiA^irtual  debugging  core  library.  Applications  linked  with  this  flavor  are 
run  as  normal  Unix  processes  from  the  NetBSD  command  line.  It  includes  a  virtual¬ 
time  scheduler  and  debugging  monitor  (described  below)  and  implements  distributed 
operation  using  normal  Unix  TCP/IP  networking  facilities. 


5.1.1.  MPL  Built-in  Primitives 

void  maruti_eu(void) 

The  marati_eu  primitive  inserts  a  Marati  EU  break  into  the  program  at  the  location  of  the 
call.  It  is  not  normally  used  ej^lidtly  in  an  application,  as  the  system  tools  put  EU  breaks 
where  necessary  for  synchronization.  It  is  useful,  however,  for  breaking  up  long-running 
EUs  —  the  maruti_eu  then  serves  as  a  possible  preemption  point 


void  send(out  ch_name,  chjype*  message_ptr) 
void  receive(in  ch_name,  ch_type*  message_ptr) 
int  optreceive(in  ch_name,  ch_type*  message_ptr) 
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The  commimications  primitives  are  documented  in  section  3.7.  in  the  MCL  Reference 
Chapter. 

5.1.2.  Calendar  Switching 

int  maruti_calendar_activate(int  calendar_num,  mtime  switch_time,  mtime  offset_time) 
void  maruti_calendar_deactivate(mt  calendar_mim,  mtime  switch_time) 

-  Maruti  calendars  may  be  activated  and  deactivated  {switched  on  or  off)  at  any  time. 
The  switch_time  is  the  time  at  which  the  de/activation  should  take  place.  The  switch  can 
occur  at  any  point  in  the  future,  and  the  switch  requests  can  come  out  of  order  with  respect 
to  the  switch  time.  Requests  with  the  same  switch  time  are  executed  in  the  order  of  the 
requests. 

Calendars  can  be  activated  with  a  particular  offset_time,  which  is  the  relative 
position  within  the  calendar  to  start  executing  at  the  switch  time.  The  offset  time  will 
normally  be  zero,  but  can  be  any  relative  time  up  to  the  1cm  time  of  the  calendar. 

The  runtime  system  does  not  check  the  feasibility  of  the  combined  schedules 
represented  by  the  calendars  -  that  should  be  done  offline. 

5.1.3.  Calendar  Modification 

void  maruti_calendar_create(int  calendar_num,  int  num_entries,  mtime  lcm_time)  void 
maruti_calendar_delete(int  calendar_num) 

Calendars  are  normally  created  offline  and  compiled  into  the  Maruti  application,  but 
it  is  possible  to  create  new  calendars  at  runtime.  The  application  is  responsible  for  insuring  • 
that  the  generated  schedules  are  feasible. 

'^en  a  calendar  is  created,  the  maximum  number  of  entries  it  will  contain  must  be 
specified,  as  well  as  the  lcm_time,  which  is  the  period  of  the  calendar  as  a  whole.  At  the 
end  of  its  period,  the  calendar  will  wrap  around  and  begin  executing  from  the  beginning 
again. 


typedef  struct  calendar_s  { 
entry_t  *entries; 
int  num_entries; 
mtime  lcm_time; 
mtime  base_time; 

entry_t  *cur;  /*  cur-entries  is  the  current  offset  */ 

<...> 

}  calendar_t; 
typedef  struct  { 
int  eu_thread,  eu_id; 
mtime  eu_start,  eu_deadline; 
int  eu_type; 

#  define  EU_EMPTY  0  /*  empty  EU  slot  */ 

#  define  EU.PERIODIC  1  /*  periodic  EU  */ 

<...> 

}  entry_t; 
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void  marut:i_calendar_get_header(iiit  calendar_num,  calendar_t  *calendarp) 
void  maruti_calendar_get_entry(int  calendar_nuin,  iat  entry_num,  entry_t  *entryp) 
void  maruti_calendar_set_entry(int  calendar_num,  int  entry_num,  entry_t  entry) 

The  marati_calendar_set_entry  call  is  used  to  populate  new  calendars.  It  can 
overwrite  any  en^  in  any  inactive  calendar.  The  entry  eu_start  and  eu_deadline  times  ate 
the  earliest  start  time  and  latest  end  time,  respectively.  The  eujd  serves  to  identify  the  eu 
when  tracing  or  reporting  timing  results. 

.  The  maruti_calendar_get_header  and  maruti_calendar_get_entry  calls  can  be  used  to 
query  the  contents  of  a  calendar.  These  are  useful  when  ‘cloning’  an  existing  calendar  into 
a  new  calendar,  perhaps  with  modifications. 

5.1.4.  Date  and  Time  Manipulation 

#include  <maruti/mtime.h> 

The  Maruti  core  library  provides  routines  and  macros  for  simple  time  and  date  calculations. 


typedef  struct  { 
long  seconds; 
long  microseconds; 
}  mtime; 


#define  time_cmp(a,b) 
#define  time_add(a,b) 
#define  time_sub(a,b) 
#define  time_add_scalar(t,  s) 
#define  time_sub_scalar(t,  s) 
#define  time_mul_scalar(t,  s) 
#define  time_div_scalar(t,  s) 


/*  like  strcmp,  0  if  eq.  It  0  if  a  <  b,  etc  */ 
/’^a+=b*/ 

/*a-=b*/ 

/*  t  +=  s  (s  is  an  int,  in  microseconds)  */ 
/*  t  -=  s  (s  is  an  int,  in  microseconds)  */ 
/*  t  *=  s  (s  is  an  int)  */ 

/*  t  /=  s  (s  is  an  int)  */ 


The  mtime  type  is  the  basic  Maruti  time  structure.  A  number  of  convenience  macros  for 
arithmetic  on  mtime  values  are  provided.  Two  mtime  values  may  be  compared,  added,  or 
subtracted.  In  addition,  an  integer  time  in  microseconds  may  be  added  to  and  subtracted 
from  an  mtime  value,  and  mtime  values  may  be  multiplied  or  divided  by  integer  scaling 
factors. 

Note:  The  irucroseconds  field  is  always  in  the  range  0  to  999999,  and  the  rimp. 
represented  by  an  mtime  value  is  always  the  number  of  seconds  plus  the  number  of 
microseconds.  These  rules  hold  even  for  negative  mtime  values,  which  can  arise  when 
subtracting  mtimes.  Thus  the  mtime  representation  for  the  time  $-1.3$  seconds  is  {  -2, 
700000  }. 


void  maruti_get_current_time(mtime  *curtime) 

The  current  system  time  is  returned  by  maruti_get_current_time.  Maruti,  like  Unix, 
represents  absolute  time  as  the  number  of  seconds  and  microseconds  since  the  Epoch  time, 
defined  as  00:00  GMT  on  January  1, 1970. 
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typedef  struct  { 
short  year; 
short  month; 
short  wday; 
short  mday; 
short  yday; 
short  second,  minute; 
short  hour; 
int  microsecond; 

}  mdate; 


/*  year  - 1900  */ 

/*  month  (0..11)  */ 

/*  day  of  week  (0..6)  */ 
/*  day  of  month  (1..31)  */ 
/*  day  of  year  (0..365)  */ 
/*  0..59  */ 

/*  0..23  *! 

/*  0..999999  */ 


mtime  maruti_date_to_time(mdate  d) 
mdate  maruti_time_to_date(mtime  t) 


mtime  maruti_gmtdate_to_time(mdate  d) 
mdate  maruti_time_to_gmtdate(mtime  t) 


int  maruti_set_jmtoff(int  gmtoff) 
int  maruti_get_gmtoff(int  *gmtoffp) 

Applications  will  often  want  to  view  the  time  as  something  more  convenient  than  the 
number  of  seconds  since  the  Epoch.  The  Maruti  mdate  type  denotes  a  time  expressed  as  a 
date  plus  a  time  of  day.  The  functions  maruti_time_to_gmtdate  and 
maruti_gmtdate_to_time  convert  between  mtime  and  mdate  values  using  the  GMT 
timezone.  The  functions  maruti_time_to_date  and  maruti_date_to_time  convert  using  the 
local  offset  from  GMT. 

The  local  timezone  used  in  these  conversions  is  initially  set  by  the  runtime  system, 
but  may  be  changed  by  the  application.  The  timezone  is  expressed  as  an  offset  from  GMT 
in  seconds.  For  example  Ae  U.S.  timezone  EST  is  5  hours  behind  GMT,  or  -18000 
seconds  offset. 

Note:  Maruti  does  not  at  this  time  attempt  to  handle  leap  seconds  or  automatically 
switching  the  local  timezone  to  accoimt  for  daylight  savings  times.  The  cost  of  providing 
these  features  in  code  and  table  space  was  deemed  prohibitive. 

5.1.5.  Miscellaneous  Functions 

void  quit(int  exitcode) 

The  quit  call  terminates  the  application.  The  exit  code  is  not  usually  relevant  in  an 
embedded  system,  but  will  be  returned  to  the  environment  where  that  makes  sense  (such  as 
in  the  Unix  debugging  environment). 
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52,  Console  Library  Reference 


#include  <maruti/console.h> 

The  Marati  console  graphics  library  provides  access  to  the  console  device, 
incluchng  Ae  keyboard  and  speaker,  but  most  importantly  the  graphical  display.  The 
graphics  library  includes  support  for  placing  text  anywhere  on  the  screen,  simple  2d 
geometry  primitives  suitable  for  generating  line  and  bar  graphs,  and  includes  optimized 
routines  for  moving  bitmaps  without  flicker,  for  animated  simulations.  There  are  currently 
three  flavors  of  the  graphics  library  implemented: 


•  et4k  -  This  flavor  supports  Super  VGA  graphics  cards  based  on  the  Tseng  Labs 
ET4(X)0  chip  and  its  accelerated  descendents,  like  the  ET4000AV32.  The  et4k 
flavor  runs  the  screen  at  a  resolution  of  1024x768  in  256  color  mode. 


•  vgal6  --  This  flavor  supports  all  standard  VGA  graphics  cards,  running  the  screen 
at  a  resolution  of  640x480,  in  16  color  banked  mode. 


•  xll  --  This  flavor  works  with  the  {\bf  ux }  core  flavor,  displaying  the  Maruti  screen 
in  an  XI 1  window  rmder  Unix. 

5.2.1.  Screen  Colors 

The  Maruti  console  graphics  library  supports  the  following  colors,  defined  m 
<maruti/console.h>: 


#define  BLACK 

0 

#defineDARK_BLUE 

1 

#define  D  ARK_GREEN 

2 

#defineDARK_CYAN 

3 

#defineDARK_RED 

4 

#define  DARK_VIOLET 

5 

#define  DARK.YELLOW 

6 

#define  DARK.WHTTE 

7 

#define  BROWN 

8 

#define  BLUE 

9 

#define  GREEN 

10 

#defmeCYAN 

11 

#define  RED 

12 

#define  VIOLET 

13 

^define  YELLOW 

14 

#define  WHITE 
/*  aliases  */ 

15 

#define  GREY 

DARK.WHIIE 

#defineGRAY 

DARK_WHnE 
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The  Tnavimiim  screen  size  supported  is  also  defined: 

#defineCONSOLE_WIDTH  1024 

#define  CONSOLE_HEIGHT  768 

5.2.2.  Graphics  Functions 

void  cons_graphics_init(void) 

The  cons_graphics_init  function  must  be  called  before  any  other  graphics  functions, 
usually  from  the  maruti_main  function  of  the  application's  screen  driver  task. 

void  cons_fill_area(int  x,  int  y,  int  width,  int  height,  int  color) 
void  cons_xor_area(int  x,  int  y,  int  width,  int  height,  int  color) 

These  functions  paint  an  area  of  the  screen,  specified  by  its  upper-left  coordinates 
(x,  y),  and  its  width  and  height,  in  the  given  color.  The  cons_fill_area  variant  overwrites 
the  previous  contents  of  that  area  of  the  screen,while  cons_xor_area  exclusive-or's  the 
screen  contents  with  the  specified  color. 

function  cons_draw_pixel(int  x,  int  y,  int  color) 
function  cons_xor_pixel(int  x,  int  y,  int  color) 

These  functions  draw  and  xor,  respectively,  a  single  pixel  at  (x,  y)  in  the  specified 

color. 

void  cons_draw_line(int  xl,  int  yl,  int  x2,  int  y2,  int  color) 
void  cons_xor_line(int  xl,  int  yl,  mt  x2,  int  y2,  int  color) 

These  functions  draw  and  xor,  respectively,  a  single-pixel  width  line  from 
coordinates  (xl,  yl)  to  (x2,  y2)  in  the  specified  color. 

void  cons_draw_bitmap(int  x,  int  y,  int  width,  int  height, 
void  ^bitmap,  int  color) 

void  cons_xor_bitmap(int  x,  int  y,  int  width,  int  height, 
void  *bitmap,  int  color) 

These  functions  draw  and  xor,  respectively,  a  width-by-height  sized  bitmap  onto 
the  screen  in  the  specified  color,  with  its  upper-left  comer  at  (x,  y).  The  bitmap  is  in 
standard  X  bitmap  format,  with  eight  pixels  per  byte,  and  an  even  m^tiple  of  eight  pixels 
per  scan  line. 


void  cons_move_bitmap(mt  xl,  int  yl,  int  x2,  int  y2,  mt  width,  int  height, 
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void  *bi1inap,  int  color) 

void  cons_xor_move_bitmap(int  xl,  int  yl,  int  x2,  int  y2,  int  width,  int  height, 
void  *bitmap,  int  color) 

These  functions  optimize  the  erasing  and  redrawing  of  a  bitmap  by  combining  the 
operations  into  one  loop,  modifying  one  scan-line  at  a  time.  This  optimization  eliminates 
the  flicker  that  can  occur  when  erasing  the  entire  bitmap  then  redrawing  it,  Tnaldng 
animations  more  effective. 

The  call  cons_move_bitmap(xl,yl,x2,y2,w,h,b,c)  is  equivalent  to  the  sequence: 


cons_draw_bitmap(xl,x2,w,h,b,BLACK); 

cons_draw_bitmap(x2,y2,w,h,b,c); 

The  call  cons_xor_move_bitmap(xl,ylpc2,y2,w,h,b,c)  is  equivalent  to  the  sequence: 


cons_xor_bitmap(xl,yl,w,h,b,c); 

cons_xor_bitmap(x2,y2,w,h,b,c); 


void  cons_puts(int  x,  int  y,  int  color,  char  *string) 
void  cons_xor_puts(int  x,  int  y,  int  color,  char  ^string) 

These  functions  draw  and  xor,  respectively,  a  text  string  at  (x,  y)  in  the  specified  color. 

5.2.3.  Keyboard  and  Speaker  functions 

typedef  struct 

{ 

unsigned  char  device;  /*  just  keyboard  works  for  now  */ 

#  define  EVENT.O’IUER  0 

#  define  EVENT.KEYBOARD  2 
unsigned  char  keycode; 

}  console_event_t; 

int  cons_poll_event(console_event_t  *event) 

The  cons_poll_event  call  returns  1  if  a  console  event  has  occurred,  0  otherwise. 
There  there  is  a  pending  console  event,  the  event  structure  is  filled  in.  The  device  field  is 
set  to  EVENT_KEYBOARD  and  the  keycode  field  is  set  to  the  scan  code  of  the  key  that 
was  pressed.  The  list  of  scan  codes  is  in  <maruti/keycodes.h>. 


void  cons_start_beep(int  pitch) 
void  cons_stop_beep(void) 
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The  console  speaker  can  be  turned  on  and  off  with  these  functions.  The 
cons_start_beep  call  programs  the  speaker  to  sound  at  a  particular  frequency,  in  hertz, 
and  cons_stop_beep  turns  it 
off. 

53.  MarutiA^irtual  Monitor 

The  ux  flavor  of  the  Maruti  core  library  includes  some  basic  debugging  facilitiw  called  the 
Maruti  monitor.  While  an  application  compiled  with  ux  is  running,  aspects  of  its  execution 
can  be  controlled  from  the  Unix  tty  (which  will  be  distinct  from  the  console  keyboard 
device).  The  monitor  provides  the  following  facilities: 

Tracing  scheduler  actions.  The  user  can  independently  toggle  the  tracing  of  elemental  unit 
executions,  calendar  wrap-around  events,  and  calendar-switch  events. 


Single-stepping  calendars  or  elemental  units.  The  user  can  toggle  single  stepping 
through  each  elemental  unit  execution,  or  a  whole  calendar's  execution. 

Controlling  virtual-time  execution  speed.  The  user  can  control  the  speed  of  the 
application  in  two  ways.  First,  the  user  can  toggle  as-soon-as-possible  execution  of 
elemental  units,  called  asap  mode.  Second,  the  user  can  set  the  speed  at  which  virtual  time 
advances  relative  to  real  clock  time. 

Both  single-keystroke  and  command-line  operation.  AU  monitor  switches  may  be 
toggled  with  a  single  keystroke  while  the  application  continues  running.  Also,  the  user  can 
enter  a  command-line  mode  in  which  various  parts  of  the  system  state  may  be  queried  and 
modified. 

5.3.1.  Controlling  Virtual  Time 

The  Maruti  monitor  contains  a  user-settable  speed  variable  which  determines  the  rate  at 
which  virtual  time  advances  relative  to  the  actud  clock  time. 

The  speed  may  be  set  to  any  floating  point  value  greater  than  zero.  Thus  virtual 
speed  may  be  set  to  run,  for  example,  five  times  faster  than  clock  time  (speed  =  5)  or  at 
four  times  slower  (speed  =  0.25).  The  speed  is  logically  limited  on  tihe  side  by  the 
utilization  of  the  CPU.  The  execution  of  application  code  can  not  be  sped  up,  only  the  idle 
time  between  executions. 

Idle  time  can  be  eliminated  completely  by  turning  on  as-soon-as-possible 
schedtiling  of  elemental  unit  {asap-mode).  In  asap-mode  the  virtual  time  is  advanced  to 
the  start  time  of  the  next  elements  unit  as  soon  as  the  previous  one  completes,  resulting 
in  the  execution  of  aU  EUs  in  immediate  succession.  Asap-mode  is  separate  from  the 
speed  variable  —  it  can  be  toggled  independently,  and  when  turned  off,  scheduling 
continues  at  the  previously  set  speed. 

5.3.2.  Single-Keystroke  Operation 

The  following  keys  are  active  from  the  Unix  tty  session  (not  the  console  keyboard)  while 
the  application  is  running: 
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?  shows  the  list  of  keystrokes  and  current  values  for  the  toggle  switches, 
a  toggle  as-soon-as-possible  mode, 
e  toggle  elemental  unit  tracing, 
c  toggle  calendar  traciug. 

X  toggle  calendar-switch  tracing, 
s  toggle  elemental  unit  single-stepping. 

S.  toggle  calendar  single-stepping, 
q  quit  application  completely. 

<ESC>  stop  application  and  enter  command-line  mode. 

5.3.3.  Command-line  Operation 

"^e  following  commands  are  available  from  command  line  mode.  At  this  time,  command¬ 
line  mode  is  a  just  a  framework  with  just  a  few  commands.  More  commands  to  query  and 
set  the  system  state  are  envisioned  for  future  releases. 

help 

Get  a  list  of  command-line  mode  commands. 

quit 

Quit  the  application  completely, 
vars 

Show  all  user-settable  monitor  variables  and  their  values, 
speed  <value> 

Set  the  virtual-time  speed  to  value.  The  value  can  be  any  floating  point  value  greater  than 
zero. 

cstep  [onloff] 

Set  or  toggle  calendar  single-stepping, 
estep  [onloff] 

Set  or  toggle  eu  single-stepping. 

ctrace  [onloff] 

Set  or  toggle  calendar  tracing. 

etrace  [onloff] 

Set  or  toggle  eu  tracing. 

strace  [onloff] 

Set  or  toggle  calendar  switch  tracing. 
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Chapter  6 

Maruti  Tool  Reference 

6.1.  Maruti  Builder 

The  mbuild  program  automates  the  process  of  building  a  runnable  Maruti  application.  This 
involves  building  the  constituent  application  binaries,  integrating  and  scheduling  the 
application,  and  binding  the  apphcation  with  the  desired  Maruti  runtime  flavor. 

Mbuild  is  normally  run  in  the  directory  in  which  the  application  config  file  and 
constituent  module  source  files  are  located.  It  will  automatically  fmd  the  config  file  by  its 
.cfg  extension,  read  it,  and  generate  a  makefile  that  builds  what  modules  it  finds  used  there, 
then  calls  the  other  Maruti  tools.  Mbuild  works  by  creating  an  obj  subdirectory,  and 
putting  all  output  files  there. 

If  there  is  more  than  one  config  file  in  the  current  directory,  the  desired  file  must  be 
specified  with  the -f  <config  file>  option. 

The  user  may  optionally  customize  the  mbuild  actions  by  providing  an  Mbuild.inc 
file  in  the  current  directory.  This  file  will  be  included  into  Ae  makefile  generated  by 
mbuild.  In  addition  to  providing  additional  build  targets  and  dependency  lines,  the  user 
may  set  some  variables  to  modify  the  mbuild  actions  themselves: 

FLAVORS  Default:  ux+xll  ux+et4k  sa+et4k  The  list  of  runtime  flavors  with  which  to 
link  the  application. 

MFC  Default:  mpc.  The  program  executed  to  compile  MPL  programs.  Not  normally 
modified  by  users. 

MPC\_FLAGS  Default:  <empty>.  Supplemental  flags  for  the  MPL  compiler.  Most 
GCC  flags  win  work  here.  Most  often  the  user  will  want  to  customize  the  include 
directories  with  -I  <dir>. 

CFG  Default:  cfg.  The  program  executed  to  interpret  the  MCL  config  fide  and  integrate  the 
application.  Not  normally  modified  by  users. 

CFG\_FLAGS  Default  <enipty>.  Supplemental  flags  for  the  MPC  integrator.  Not 
normally  modified  by  users. 

ALLOCATOR  Default:  allocator.  The  program  executed  to  allocate  and  schedule  the 
application.  Not  normally  modified  by  users. 

ALLOCATOR\_FLAGS  Default -p  1.  Flags  for  the  Allocator.  See  section  6.4.  on  the 
Allocator  below  for  more  details. 

MBIND  Default  mbind.  The  program  executed  for  binding  the  application  and  runtime 
system.  Not  normally  modified  by  the  users. 

MBIND\_FLAGS  Default <empty>.  Flags  for  the  Mbind  program.  Not  normally 
modified  by  users. 
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62.  MPL/C  Compiler 

The  MPL/C  compiler  (mpc)  consists  of  a  modified  gcc  plus  some  attendant  scripts  to  post¬ 
process  the  compiler  output.  It  generates  a  .0  file  for  a  module,  plus  a  .eul  file  containing  a 
partial  elemental-unit  graph  to  be  read  by  the  integrator. 

The  inpc  program  ysdll  accept  GCC  command-line  options.  See  the  gcc(l)  manual 
page  for  details  on  the  available  options.  The  most  commonly  used  option  will  be  -I  dir  to 
customize  the  include  directories. 


63.  MPL/C  Integrator 

The  MCL  Integrator  (cfg)  reads  the  application  config  file  (appname.cfg)  and  all  the  module 
elemental-unit  graph  files  (nwdulename.eut),  then  generates  and  checks  all  the  jobs,  tasks, 
threads,  and  connections  for  the  application.  It  outputs  a  loader  map  file  {appname.ldf)), 
and  a  complete  ^plication  elemntal-unit  graph  annotated  with  allocation  and  scheduling 
constraints  and  communication  parameters  {appname.sch).  There  are  no  cfg  options 
normally  used. 

6.4.  Allocator/Scheduler 


The  Allocator/Scheduler  (allocator)  attempts  to  find  a  valid  allocation  for  the  application 
tasks  across  the  nodes  of  the  network,  and  a  valid  schedule  for  each  node  and  for  the 
network  bus.  The  allocation  and  schedules  are  considered  valid  if  all  allocation, 
communication,  and  scheduling  constraints  are  met 

The  allocator/scheduler  stops  when  a  valid  allocation  and  schedule  is  found,  or 
when  it  is  determined  that  one  cannot  be  found.  There  is  no  attempt  to  load-balance  the 
nodes  or  minimize  network  communications  beyond  what  is  needed  for  a  minimally  valid 
schedule.  The  allocator  ou^uts  an  allocation  information  file  (appname.alloc)  and  calendar 
schedules  file  (appname.cal). 

The  allocator  t^es  two  flags: 


-p  <number  of  processors>  Default  1.  The  number  of  processors  in  the  target  system. 
It  should  match  the  number  of  nodes  defined  in  the  config  file. 

-t  <tdnia  slot  size>  Default  10(X).  The  Time  Division  Multiplexed  Access  (TDMA)  slot 
size  for  the  network  bus.  This  is  the  time,  in  microseconds,  that  each  node  will  be  alloted  to 
transmit  on  the  network.  All  the  nodes  get  a  TDMA  slot  in  tom.  The  tdma  slot  size  should 
stay  between  1000  and  16000  microseconds,  depending  on  the  application's  latency 
requirements  and  the  network  hardware's  buffering  capacity. 

63.  Maruti  Binder 


TheMarati  Binder  (mbind)  reads  in  the  loader  map  .Idf  file,  the  allocation  .alloc  file,  and 
the  calendars  .cal  file,  and  generates  the  static  data  stractores  needed  by  the  runtime  system 
(appname-globals.c).  It  also  generates  a  makefile  {appname-bind.mk)  that  manages  the 
linking  of  each  task  of  the  application  within  its  own  logical  address  space,  then  linVing  all 
tasks  together  with  the  various  flavors  of  the  runtime  litoary. 
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6.6.  Timing  Trace  Analyzer 


The  Timing  Trace  Analyser  (timestat)  takes  a  list  of  timing  output  files  as  generated  by 
the  runtime  system  and  generates  a  .wcet  file  that  contains  Ae  worst  case  execution 
times  for  the  elemental  units,  as  needed  by  the  allocator.  Timestat  also  prints  other 
statistics  generated  by  the  runtime  system. 

6.7.  .  Timing  Stats  Monitor 

Timing  information  is  output  from  a  stand-alone  Maruti  system  through  a  serial  port  when 
the  application  terminates.  The  mgettimes  program,  rurmiug  on  another  computer 
coimected  to  the  other  end  of  that  serial  line,  will  receive  the  timing  data  and  store  it  in  a  file 
suitable  for  processing  by  timestat.  Mgettimes  can  process  the  output  of  multiple  runs  on 
the  test  setup,  even  from  different  applications.  Simply  leave  the  program  ruiming  and  any 
data  that  is  received  will  be  saved. 

Mgettimes  is  called  as  follows: 

mgettimes  <speed>  <serial-port> 

where  <speed>  is  the  communications  rate  at  which  the  times  will  be  output  (19200  bps  in 
the  default  core),  and  <serial-port>  is  the  device  file  for  the  communications  port  (for 
example,  /dev/ttyOO  for  the  PC's  COMl  port). 
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Maruti  3.1 
Design  Overview 
First  Edition 


Systems  Design  and  Analysis  Group 
Department  of  Computer  Science 
University  of  Maryland  at  College  Park 


1.  Introduction 

Many  complex,  mission  critical  systems  depend  not  only  on  correct  functional  behavior, 
but  ^0  on  correct  temporal  behavior.  These  systems  are  called.  The  most  critical  systems 
in  this  domain  are  those  which  must  support  applications  with  hard  real-time  constraints, 
in  which  missing  a  deadline  may  cause  a  fatal  error.  Due  to  their  criticality,  jobs  with  hard 
real-time  constraints  must  always  execute  satisfying  the  user  specified  timing  constraints, 
despite  the  presence  of  faults  such  as  site  crashes  or  link  failures. 

A  real-time  operating  system,  besides  having  to  support  most  functions  of  a 
coriventional  operating  system,  carries  the  extra  burden  of  guaranteeing  that  the  execution 
of  its  requested  jobs  will  satisfy  thdr  timing  constraints.  In  order  to  carry  out  real-time 
processing,  the  requirements  of  the  jobs  have  to  be  specified  to  the  system,  so  that  a 
suitable  schedule  can  be  made  for  the  job  execution.  Thus,  conventional  application 
development  techniques  must  be  enhanced  to  incorporate  support  for  specification  of  timing 
and  resource  requirements.  Further,  tools  must  be  made  available  to  extract  these 
requirements  from  the  application  programs,  and  analyze  them  for  schedulabhity. 

Based  on  the  characteristics  of  its  jobs,  a  real-time  system  can  be  classified  as  static, 
dynamic  or.  In  a  static  system,  all  (hard  real-time)  jobs  and  their  execution  characteristics 
are  known  ahead  of  time,  and  thus  can  be  static^y  analyzed  prior  to  system  operation. 
Many  such  systems  are  built  using  the  cyclic  executive  or  static  priority  architecture.  In 
contrast,  there  are  many  systems  in  which  new  processing  requests  may  be  made  while  the 
system  is  in  operation.  In  a  dynamic  system,  new  requests  arrive  asynchronously  and  must 
be  processed  immediately.  However,  since  new  requests  demand  immediate  attention,  such 
systems  must  either  have  “soft”  constraints,  or  be  lightly  loaded  and  rely  on  exception 
mechanisms  for  violation  of  timing  constraints.  In  contrast,  reactive  systems  have  certain 
lead  time  to  decide  whether  or  not  to  accept  a  newly  arriving  processing  request.  Due  the 
presence  of  the  lead  time,  a  reactive  system  can  carry  out  analysis  without  adversely 
affecting  the  schedulability  of  currenliy  accepted  requests.  If  adequate  resources  are 
available  then  the  job  is  accepted  for  execution.  On  the  other  hand,  if  adequate  resources 
are  not  available  then  the  job  is  rejected  and  does  not  execute.  The  ability  to  reject  new  jobs 
distinguishes  a  reactive  system  from  a  completely  dynanuc  system. 

The  purpose  of  the  Maruti  project  is  to  create  an  environment  for  the  development 
and  deployment  of  critical  applications  with  hard  real-time  constraints  in  a  reactive 
environment  Such  applications  must  be  able  to  execute  on  a  platform  consisting  of 
distributed  and  heterogeneous  resources,  and  operate  continuously  in  the  presence  of 
faults. 

The  Maruti  project  started  in  1988.  The  first  version  of  the  system  was  designed  as 
an  object-oriented  system  with  suitable  extensions  for  objects  to  support  real-time 
operation.  The  proof-of-concept  version  of  this  design  was  implemented  to  run  on  top  of 
the  Unix  operating  system  and  supported  hard  and  non-real-time  applications  naming  in  a 
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distributed,  heterogeneous  environment.  The  feasibility  of  the  fault  tolerance  concepts 
incorporated  in  the  design  of  Maruti  system  were  also  demonstrated.  No  changes  to  the 
Unix  kernel  were  made  in  that  implementation,  which  was  operational  in  1990.  We  realized 
that  Unix  is  not  a  very  hospitable  host  for  real-time  applications,  as  very  htfle  control  over 
the  use  of  resources  can  be  exercised  in  that  system  without  extensive  modifications  to  the 
kernel.  Therefore,  based  on  the  lessons  learned  from  the  first  design,  we  proceeded  with 
the  design  of  the  current  version  of  Maruti  and  changed  the  implementation  base  to  CMU 
Mach  which  permitted  a  more  direct  control  of  resources. 

Most  recenfly,  we  have  implemented  Maruti  directly  on  486  PC  hardware, 
providing  Maruti  applications  total  control  over  resources.  The  initial  version  of  the 
distributed  Maruti  has  also  been  implemented,  allowing  Mamti  appHcations  to  run  across  a 
network  in  a  S5mchronized,  hard  re^-time  maimer. 

In  this  paper,  we  summarize  the  design  philosophy  of  the  Maruti  system  and 
discuss  the  design  and  implementation  of  Maruti.  We  also  present  the  development  tools 
and  operating  system  support  for  mission  critical  applications.  While  the  system  is  being 
designed  to  provide  integrated  support  for  multiple  requirements  of  mission  critic^ 
applications,  we  focus  our  attention  on  real-time  requirements  on  a  single  processor 
system. 


2.  Maruti  Design  Goals 

The  design  of  a  real-time  system  must  take  into  consideration  the  primary  characteristics  of 
the  applications  which  are  to  be  supported.  The  design  of  Maruti  has  been  guided  by  the 
following  application  characteristics  and  requirements. 

•  Real-Time  Requirements.  The  most  important  requirement  for  real-time  systems  is 
the  capability  to  support  the  timely  execution  of  applications.  In  contrast  with  many 
existing  systems,  the  next-generation  systems  will  require  support  for  hard,  soft,and 
non-red-time  applications  on  the  same  platform. 

•  Fault  Tolerance.  Many  mission-critical  systems  are  safety-critical,  and  therefore  have 
fault  tolerance  requirements.  In  this  context,  fault  tolerance  is  the  ability  of  a  system  to 
support  continuous  operation  in  the  presence  of  faults. 

Although  a  number  of  techniques  for  supporting  fault-tolerant  systems  have  been  suggested 
in  the  literature,  they  rarely  consider  the  real-time  requirements  of  the  system.  A  red-time 
operating  system  must  provide  support  for  fault  tolerance  and  exception  handling 
capabilities  for  increased  reliability  while  continuing  to  satisfy  the  timing  requirements. 

•  Distributivity.  The  inherent  characteristics  of  many  systems  require  that  multiple 
autonomous  computers,  cormected  through  a  locd  area  network,  cooperate  in  a 
distributed  manner.  The  computers  and  other  resources  in  the  system  may  be 
homogeneous  or  heterogeneous.  Due  to  the  autonomous  operation  of  tire  components 
which  cooperate,  system  control  and  coordination  becomes  a  much  more  difficult  task 
than  if  the  system  were  implemented  in  a  centralized  maimer.  The  teclmiques  learned  m 
the  design  and  implementation  of  centralized  systems  do  not  dways  extend  to 
distributed  systems  in  a  strdghtforward  manner. 
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•  Scen^os.  Many  real-time  applications  undergo  different  modes  of  operation  during 
their  life  cycle.  A  scenario  defines  the  set  of  jobs  executing  in  the  system  at  any  given 
time.  A  hard  real-time  system  must  be  capable  of  switching  from  one  scenario  to 
another,  maintaining  the  system  in  a  safe  and  stable  state  at  all  times,  without  violating 
the  timing  constraints. 


•  Integration  of  Multiple  Requirements.  The  major  challenge  in  building  operating 
systems  for  mission  critical  computing  is  the  integration  of  multiple  requirements. 
Because  of  the  conflicting  nature  of  some  of  the  requirements  and  the  solutions 
developed  to  date,  integration  of  all  the  requirements  in  a  single  system  is  a  formidable 
task.  For  example,  the  real-time  requirements  preclude  the  use  of  many  of  the  fault¬ 
handling  techniques  used  in  other  fault-tolerant  systems. 


3.  Design  Approach  and  Principles 


Maruti  is  a  time-based  system  in  which  the  resources  are  reserved  prior  to  execution. 
Resource  reservation  is  done  on  the  time-line,  thus  allowing  for  reasoning  about  real-time 
properties  in  a  natural  way.  The  time-driven  architecture  provides  predictable  execution  for 
real-time  systems,  a  necessary  requirement  for  critical  applications  requiring  hard  real-time 
performance.  The  basic  design  approach  is  outlined  below: 

•  Resource  Reservation  for  Hard  Real-Ume  Jobs.  Hard  real-time  applications  in 
Maruti  have  advance  resource  reservation  resulting  in  a  priori  guarantees  about  the 
timely  execution  of  hard  real-time  jobs.  This  is  achieved  through  a  calendar  data 
structure  which  keeps  track  of  all  resource  reservations  and  the  assigned  time  intervals. 
The  resource  requirements  are  specified  as  early  as  possible  in  the  development  stage  of 
an  application  and  are  manipulated,  analyzed,  and  refined  through  all  phases  of 
application  development. 

•  Predictability  through  Reduction  of  Resource  Contention.  Hard  real-time  jobs  are 
scheduled  using  a  time-driven  scheduling  paradigm  in  which  the  resource  contention 
between  jobs  is  eliminated  through  scheduling.  This  results  in  reduced  runtime 
overheads  and  leads  to  a  high  degree  of  predictability.  However,  not  aU  jobs  can  be 
pre-scheduled.  Since  resources  may  be  shared  between  jobs  in  the  calendar  and  other 
jobs  in  the  system,  such  as  non-red-time  activities,  there  may  be  resource  contention 
leading  to  lack  of  predictability.  This  is  countered  by  eliminating  as  much  resource 
contention  as  possible  and  reducing  it  whenever  it  is  not  possible  to  eliminate  it  entirely. 
The  lack  of  predictability  is  compensated  for  by  allowing  enough  slack  in  the  schedule. 

•  Integrated  Support  for  Fault  Tolerance.  Fault  tolerance  objectives  are  achieved  by 
integrating  the  support  for  fault  tolerance  at  all  levels  in  the  system  design.  Fault 
tolerance  is  supported  by  early  fault  detection  and  handling,  resilient  application 
structmes  through  redundancy,  and  the  capability  to  switch  modes  of  operation.  Fault 
detection  capabilities  are  integrated  into  the  application  during  its  development, 
permitting  the  use  of  application  specific  fault  detection  and  fault  handling.  As  fault 
handling  may  result  in  violation  of  temporal  constraints,  replication  is  used  to  maVp.  die 
application  resilient.  Failure  of  a  replica  may  not  affect  the  timely  execution  of  other 
replicas  and,  thereby,  the  operation  of  the  system  it  may  be  controlling.  Under 
anticipated  load  and  failure  conditions,  it  may  become  necessary  for  the  system  to 
revoke  the  guarantees  given  to  the  hard  real-time  applications  and  change  its  mode  of 
operation  dynamically  so  that  an  acceptable  degrad^  mode  of  operation  may  continue. 
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•  Separation  of  Medianism  and  Policy.  In  the  design  of  Maruti,  an  emphasis  has  been 
placed  on  separating  mechanism  from  policy.  Thus,  for  instance,  the  system  provides 
basic  dispatching  mechanisms  for  a  time-driven  system,  keeping  the  design  of  specific 
scheduling  policies  separate.  The  same  approach  is  followed  in  other  aspects  of  the 
system.  By  separating  the  mechanism  from  the  policy,  the  system  can  be  tailored  and 
optimized  to  different  environments. 

•  Portability  and  Extensibility.  Unlike  many  other  real-time  systems,  the  aim  of  the 
Maruti  project  has  been  to  develop  a  system  which  can  be  tailored  to  use  in  a  wide 
variety  of  situations — from  small  embedded  systems  to  complex  mission-critical 
systems.  With  the  rapid  change  in  hardware  technology,  it  is  imperative  that  the  design 
be  such  that  it  is  portable  to  different  platforms  and  m^es  minimal  assumptions  about 
the  underlying  hardware  platform.  Portability  and  extensibility  is  also  enhanced  by 
using  modular  design  with  well  defined  interfaces.  This  allows  for  integration  of  new 
techniques  into  the  design  with  relative  ease. 

•  Support  of  Hard,  Soft,  and  Non-Real>Time  in  the  Same  Environment.  Many 
critical  systems  consist  of  applications  with  a  mix  of  hard,  soft,  and  non-real-time 
requirements.  Since  they  may  be  sharing  data  and  resources,  they  must  execute  within 
the  same  environment.  Ibe  approach  taken  in  Mamti  is  to  support  the  integrated 
execution  of  applications  with  multiple  requirements  by  reducing  and  bounding  the 
unpredictable  interaction  between  them. 

•  Support  for  Distributed  Operation.  Many  embedded  systems  require  several 
processors.  When  multiple  processors  function  autonomously,  their  use  in  hard  real¬ 
time  applications  requires  operating  system  support  for  coordinated  resource 
management  Maruti  provides  coordinated,  time-based  resource  management  of  all 
resources  in  a  distributed  environment  including  the  processors  and  the  communication 
channels. 

•  Support  for  Multiple  Execution  Environments.  Mamti  provides  support  for 
multiple  execution  environments  to  facilitate  program  development  as  well  as  execution. 
Real-time  applications  may  execute  in  the  Mamti/Mach  or  Marati/Standalone 
environments  and  maintain  a  high  degree  of  temporal  determinacy.  The 
Maruti/Standalone  environment  is  best  suited  for  the  embedded  applications  while 
Mamti/Mach  permits  the  concurrent  execution  of  hard  real-time  and  non-ieal-time  Unix 
applications.  In  addition,  the  Marati/Virtual  environment  has  been  designed  to  aid  the 
development  of  real-time  applications.  In  this  environment  the  same  code  which  runs  in 
the  other  two  environments  can  execute  while  access  to  all  Unix  debugging  tools  is 
available.  In  this  environment,  temporal  accuracy  is  maintained  with  respect  to  a  virtual 
real-time. 

•  Support  for  Temporal  Debugging.  When  an  application  executes  in  the 
Marati/Virtual  environment  its  interactions  are  carried  out  with  respect  to  virtual  real¬ 
time  which  is  under  the  control  of  the  user.  The  user  may  speed  it  up  with  respect  to 
actual  time  or  slow  it  down.  The  virtual  time  may  be  paused  at  any  instant  and  the 
debugging  tools  used  to  examine  the  state  of  the  execution.  In  this  way  we  may  debug 
an  application  while  maintaining  all  temporal  relationships,  a  process  we  caU  tempord 
debugging. 
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4.  Application  Development  Environment 

To  support  applications  in  a  real-time  system,  conventional  application  development 
techniques  and  tools  must  be  p^ented  with  support  for  specification  and  extraction  of 
resource  requirements  and  tuning  constraints.  The  application  development  system 
provides  a  set  of  programming  tools  to  support  and  facilitate  the  development  of  real-time 
applications  with  diverse  requirements.  The  Maruti  Programming  Language  (MPL)  is  used 
to  develop  individual  program  modules.  The  Maruti  Configuration  Language  (MCL)  is 
used  to  specify  how  individual  program  modules  are  to  be  connected  together  to  form  an 
application  and  the  details  of  the  hardware  platform  on  which  the  application  is  to  be 
executed. 

4.1.  Maruti  Programming  Language 

Rather  than  develop  completely  new  programming  languages,  we  have  taken  the  approach 
of  using  exiting  languages  as  base  programming  languages  and  augmenting  them  with 
Maruti  primitives  needed  to  provide  real-time  support 

In  the  current  version,  the  base  programming  language  used  is  ANSI  C.  MPL  adds 
modules,  shared  memory  blocks,  critical  regions,  typed  message  passing,  periodic 
functions,  and  message-invoked  factions  to  the  C  language.  To  make  analyzing  the 
resource  usage  of  programs  feasible,  certain  C  idioms  are  not  allowed  in  MPL;  in 
particular,  recursive  function  calls  are  not  allowed  nor  are  unbounded  loops  containing 
externally  visible  events,  such  as  message  passing  and  critical  region  transitions. 

•  The  code  of  an  application  is  divided  into  modules.  A  module  is  a  collection  of 
procedures,  functions,  and  local  data  structures.  A  module  forms  an  independently 
compiled  unit  and  may  be  connected  with  other  modules  to  form  a  complete  application. 
Each  module  may  have  an  initialization  function  which  is  invoked  to  inifiabyp.  ihe 
module  when  it  is  loaded  into  memory.  The  initialization  function  may  be  called  with 
arguments. 


•  Communication  primitives  send  and  receive  messages  on  one-way  typed  channels. 
There  are  several  options  for  defining  channel  endpoints  that  specify  what  to  do  oii 
buffer  overflow  or  when  no  message  is  in  the  channel.  The  connection  of  two  end¬ 
points  is  done  in  the  MCL  specification  for  the  application-Maruti  insures  that  end¬ 
points  are  of  the  same  t5rpe  and  are  connected  properly  at  runtime. 

•  Periodic  functions  defme  entry  points  for  execution  in  the  application.  The  MCL 
specification  for  the  application  will  determine  when  these  functions  execute. 


•  Message-invoked  functions,  called  services,  are  executed  whenever  messages  are 
received  on  a  channel. 

•  Shared  memory  blocks  can  be  declared  inside  modules  and  are  cormected  together  as 
specified  in  the  MCL  specifications  for  the  application. 

•  An  action  defines  a  sequence  of  code  that  denotes  an  externally  observable  action  of  the 
module.  Actions  are  used  to  specify  timing  constraints  in  the  MCL  specification. 
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•  Critical  Regions  are  used  to  safely  access  and  maintain  data  consistency  between 
executing  entities.  Maruti  ensures  that  no  two  entities  are  scheduled  to  execute  inside 
their  critical  regions  at  the  same  time. 

4.2.  Maruti  Configuration  Language 

MPL  Modules  are  brought  together  into  as  an  executable  application  by  a  specification  file 
written  in  the  Maruti  Configuration  Language  (MCL).  The  MCL  specification  determines 
the  application’s  hard  real-time  constraints,  the  allocation  of  tasks,  threads,  and  shared 
memory  blocks,  and  all  message-passing  connections.  MCL  is  an  interpreted  C-like 
language  rather  than  a  declarative  lan^age,  allowing  the  instantiation  of  complicated 
subsystems  using  loops  and  subroutines  in  Ae  specification.  The  key  features  of  MCL 
include: 

•  Tasks,  Threads,  and  Channel  Binding.  Each  module  may  be  instantiated  any 
number  of  times  to  generate  tasks.  The  threads  of  a  task  are  created  by  instantiating  the 
entries  and  services  of  the  corresponding  module.  An  entry  instantiation  also  indicates 
the  job  to  which  the  entry  belongs.  A  service  instantiation  belongs  to  the  job  of  its 
client  The  instantiation  of  a  service  or  entry  requires  binding  the  input  and  output  ports 
to  a  channel.  A  channel  has  a  single  input  port  indicating  tire  sender  and  one  or  more 
output  ports  indicating  the  receivers.  The  configuration  language  uses  channel  variables 
for  defining  the  channels.  The  definition  of  a  channel  also  includes  the  type  of 
communication  it  supports,  i.e.,  synchronous  or  asynchronous. 

•  Resources.  All  global  resources  (i.e.,  resources  which  are  visible  outside  a  module)  are 
specified  in  the  configuration  file,  along  with  the  access  restrictions  on  the  resource. 
The  configuration  language  allows  for  binding  of  resources  in  a  module  to  the  global 
resources.  Any  resources  use  by  a  module  which  are  not  mapped  to  a  global  resource 
are  considered  local  to  the  module. 

•  Tiiiiing  Requirements  and  Constraints.  These  are  used  to  specify  the  temporal 
requirements  and  constraints  of  the  program.  An  application  consists  of  a  set  of 
cooperating  jobs.  A  job  is  a  set  of  entries  (and  the  services  called  by  the  entries)  which 
closely  cooperate.  Associated  with  each  job  are  its  invocation  characteristics,  i.e., 
whether  it  is  periodic  or  aperiodic.  For  a  periodic  job,  its  period  and,  optionally,  the 
ready  time  and  deadline  within  the  period  are  specked.  The  constraints  of  a  job  apply 
to  ail  component  threads.  In  addition  to  constraints  on  jobs  and  threads,  feer  level 
timing  constraints  may  be  specified  on  the  observable  actions.  An  observable  action 
may  be  specified  in  the  code  of  the  program.  For  any  observable  action,  a  ready  time 
and  a  deadline  may  be  specified.  These  are  relative  to  die  job  arrival.  An  action  may  not 
start  executing  before  the  ready  time  and  must  finish  before  the  deadline.  Each  thread  is 
an  implicitly  observable  action,  and  hence  may  have  a  ready  time  and  a  deadline. 

Apart  from  the  ready  time  and  deadline  constraints,  programs  in  Maruti  can  also  specify 
relative  timing  constraints,  those  which  constrain  the  interval  between  two  events.  For  each 
action,  the  start  and  end  of  the  action  mark  the  observable  events.  A  relative  constraint  is 
used  to  constrain  the  temporal  separation  between  two  such  events.  It  may  be  a  relative 
deadline  constraint  which  specifies  the  upper  bound  on  time  between  two  events,  or  a  delay 
constraint  which  specifies  the  lower  bound  on  time  between  the  occurrence  of  the  two 
events.  The  interval  constraints  are  closer  to  the  event-based  real-time  specifications,  which 
constrain  the  minimum  and/or  maximum  distance  between  two  events  and  allow  for  a  rich 
expression  of  timing  constraints  for  real-time  programs. 
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•  Replication  and  Fault  Tolerance.  At  the  application  level  fault  tolerance  is  achieved  by 
creating  rwihent  applications  by  replicating  parts,  or  all,  of  the  application.  Ihe 
configuration  language  eases  the  task  of  achieving  fault  tolerance  by  allowing 
mech^sms  to  replicate  the  modules,  and  services,  thus  achieving  the  desired  amount 
of  resiliency.  By  specifying  allocation  constraints,  a  programmer  can  ensure  that  the 
replicated  modules  are  executed  on  different  partitions. 


5.  Analysis  and  Resource  Allocation 


This  phase  involves  analyzing  the  resource  allocation  and  scheduling  of  a  collection  of 
applications  in  terms  of  their  real-time  and  fault-tolerance  properties.  The  properties  of  the 
system  are  analyzed  with  respect  to  the  system  configuration  and  the  characteristics  of  tiie 
runtime  system,  and  resource  calendars  are  generated. 

The  analysis  phase  converts  the  application  program  into  fine-grained  segments 
called  elemental  units.  AH  subsequent  analysis  and  resource  allocation  are  based  on  EUs. 

5.1.  Elemental  Unit  Model 

The  basic  building  block  of  the  Maruti  computation  model  is  the  elemental  unit  (EU).  In 
general,  an  elemental  unit  is  an  executable  entity  which  is  triggered  by  incoming  data  and 
signals,  operates  on  the  input  data,  and  produces  some  outyut  data  and  signals.  The 
behavior  of  an  EU  is  atomic  with  respect  to  its  environment  Specifically: 

•  All  resources  needed  by  an  elemental  unit  are  assumed  to  be  required  for  the  entire 
length  of  its  execution. 

•  The  interaction  of  an  EU  with  other  entities  of  the  systems  occurs  either  before  it  starts 
executing  or  after  it  ftuishes  execution. 

The  components  of  an  EU  are  illustrated  in  Figure  1  and  are  described  below: 


input  data/signals 


Figure  1:  Structure  of  an  Elemental  Unit 
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•  Input  and  Output  Ports.  Each  EU  may  have  several  input  and/or  output  ports.  Each 
port  specifies  a  part  of  the  interface  of  the  EU.  The  input  ports  are  used  to  accept 
incoming  input  data  to  the  EU,  while  the  output  ports  are  used  for  feeding  the  output  of 
the  EU  to  other  entities  in  the  system. 

•  Input  and  Output  Monitors.  An  input  monitor  collects  the  data  from  the  input  ports, 
and  provides  it  to  the  main  body.  Ih  doing  so,  it  acts  as  a  filter,  and  may  also  be  used 
for  error  detection  and  debugging.  The  input  monitors  are  also  used  for  supporting 
different  triggering  conditions  for  the  EU.  Similar  to  input  monitors,  the  output 
monitors  act  as  filters  to  the  outgoing  data.  The  output  monitor  may  be  used  for  error 
detection  and  timingconstraint  enforcement.  The  monitors  may  be  connected  to  other 
EUs  in  the  system  and  may  send  (asynchronous)  messages  to  them  reporting  errors  or 
status  messages.  The  receiving  EU  may  perform  some  error-handling  functions. 

•  Main  Body.  The  main  body  accepts  the  input  data  from  the  input  monitor,  acts  on  it, 
and  supplies  the  output  to  Ae  ou^ut  monitor.  It  defines  the  functionality  provided  by 
the  EU. 

Annotated  with  an  elemental  unit  are  its  resource  requirements  and 
timingconstraints,  which  are  supplied  to  the  resource  schedulers.  The  resource  schedulers 
must  ensure  that  the  resources  are  made  available  to  the  EU  at  the  time  of  execution  and  that 
its  timing  constraints  are  satisfied. 

5.2.  Composition  of  EUs 

In  order  to  define  complex  executions,  the  EUs  may  be  composed  together  and  properties 
specified  on  the  composition.  Elemental  units  are  composed  by  connecting  an  output  port 
of  an  EU  with  an  input  port  of  another  EU.  A  valid  connection  requires  that  the  input  and 
output  port  types  are  compatible,  i.e.,  they  carry  the  same  message  type.  Such  a  connection 
marks  a  one-way  flow  of  data  or  control,  depending  on  the  nature  of  the  ports.  A 
composition  of  Eus  can  be  viewed  as  a  directed  acyclic  graph,  called  an  elemental  unit 
graph  (EUG),  in  which  the  nodes  are  the  EUs,  and  the  edges  are  the  connections  between 
EUs.  An  incompletely  specified  EUG  in  which  all  input  and  output  ports  are  not  connected 
is  termed  as  a  partial  EUG  (PEUG).  A  partial  EUG  may  be  viewed  as  a  higher  level  EU. 
In  a  complete  EUG,  all  input  and  output  ports  are  connected  and  there  are  no  cycles  in  the 
graph.  The  acyclic  requirement  comes  from  the  required  time  determinacy  of  execution.  A 
program  with  rmbounded  cycles  or  recursions  may  not  have  a  temporally  determinate 
execution  time.  Bounded  cycles  in  an  EUG  are  converted  into  an  acyclic  graph  by  loop 
imroUing. 

The  composition  of  EUs  supports  higher  level  abstractions  and  the  properties  associated 
with  them.  By  carefully  choosing  the  abstractions,  the  task  of  developing  applications  and 
ensuring  that  the  timing  and  other  operational  constraints  are  satisfi^  can  be  greatly 
simplified.  In  Maruti,  we  have  chosen  Ae  following  abstractions: 

•  A  thread  is  a  sequential  composition  of  elemental  units.  It  has  a  sequential  flow  of 
control  which  is  triggered  by  a  message  to  the  first  EU  in  the  thread.  The  flow  of 
control  is  terminated  with  the  last  EU  in  the  thread.  Two  adjacent  EUs  of  a  thread  are 
cormected  by  a  single  link  carrying  the  flow  of  control.  The  component  elemental  units 
may  receive  messages  or  send  messages  to  elemental  units  outside  the  thread.  All  EUs 
of  a  thread  share  the  execution  stack  and  processor  state. 
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•  A  job  is  a  collection  of  threads  which  cooperate  with  each  other  to  provide  some 
functionality.  The  partial  EUGs  of  the  component  threads  are  connected  together  in  a 
well  defined  manner  to  form  a  complete  BUG.  All  threads  within  a  job  operate  under  a 
global  timing  constraint  specified  for  the  job. 

5.3.  Program  Analysis 

Program  modules  are  independently  compiled.  In  addition  to  the  generation  of  the  object 
code,  compilation  also  results  in  the  creation  of  partial  EUGs  for  the  modules,  i.e.,  for  the 
services  and  entries  in  the  module,  as  well  as  the  extraction  of  resource  requirements  such 
as  stack  sizes  for  threads,  memory  requirements,  and  logical  resource  requirements. 

Invocation  of  an  entry  point  and  service  call  starts  a  new  thread  of  execution.  A 
control  flow  graph  is  generated  for  each  service  and  entry.  The  control  flow  graph  and  the 
MPL  primitives  are  used  to  delineate  EU  boundaries.  Note  that  an  EU  execution  is  atomic, 
i.e.,  all  resources  required  by  the  EU  are  assumed  to  be  used  for  the  entire  duration  of  its 
execution.  Further,  all  input  messages  are  assumed  to  be  logically  received  at  the  start  of  an 
EU  and  all  output  messages  are  assumed  to  be  logically  sent  at  the  end  of  an  EU.  At 
compilation  toe,  the  code  for  each  entry  and  service  is  broken  up  into  one  or  more 
elemental  units.  The  delineation  of  EU  boundaries  is  done  in  a  manner  that  ensures  that  no 
cycles  are  formed  in  the  resultant  EUG.  Thus,  for  instance,  a  send  followed  by  a  receive 
within  the  same  EU  may  result  in  a  cyclic  precedence  and  must  be  prevented.  We  follow 
certain  rules  of  thumb  to  delineate  EU  boundaries,  which  may  be  overridden  and  explicitly 
changed  by  the  user.  The  EU  boundaries  are  created  at  a  receive  statement,  the  beginning 
and  end  of  a  resource  block,  and  the  beginning  and  end  of  an  observable  action.  For  each 
elemental  unit  a  symbolic  name  is  generated  and  is  used  to  identify  it  The  predecessors  and 
successors  of  the  EU  as  well  as  the  source  code  line  numbers  associated  with  the  EU  are 
identified  and  stored.  The  resource  and  timing  requirements  that  can  be  identified  during 
compilation  are  also  stored,  and  place  holders  are  created  for  the  remaining  information. 

Given  an  application  specification  in  the  Mamti  Configuration  Language  and  the 
component  application  modules,  the  integration  tools  are  responsible  for  creating  a 
complete  application  program  and  extracting  out  the  resource  and  tinning  information  for 
scheduling  and  resource  allocation.  The  input  to  the  integration  process  are  the  program 
modules,  the  partial  EUGs  corresponding  to  the  modules,  the  application  configuration 
specification,  and  the  hardware  specifications.  The  outputs  of  the  integration  process  are:  a 
specification  for  the  loader  for  creating  tasks,  populating  their  address  spaces,  creating  the 
threads  ad  ch^nels,  and  initializing  the  task;  loadable  executables  of  the  program;  and  the 
complete  application  EUG  along  with  the  resource  descriptions  for  the  resource  Allocation 
and  scheduling  subsystem. 

5.4.  Communication  Model 

Maruti  supports  message  passing  and  shared  memory  models  for  communication. 

•  Message  Passing.  Maruti  supports  the  notion  of  one-way  message  passing  between 
elemental  units.  Message  passing  provides  a  location-independent  and  architecture- 
transparent  communication  paradigm.  A  channel  abstraction  is  used  to  specify  a  one 
way  message  communication  patii  between  a  sender  and  a  receiver.  A  one-way 
message-passing  channel  is  set  up  by  declaring  the  output  port  on  the  sender  EU,  the 
input  port  on  the  receiver  EU,  and  the  type  of  message.  The  cnmmnnirattnn  is 
asynchronous  with  respect  to  the  sender,  i.e.,  the  sender  does  not  block. 
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•  Synchronous  Communication.  Synchronous  communication  is  used  for  tightiy 
coupled  message  passing  between  elemental  units  of  the  same  job.  For  every  invocation 
of  the  sender  there  is  an  invocation  of  the  receiver  which  accepts  the  message  sent  by 
the  sender.  The  receiver  is  blocked  (de-scheduled)  until  message  arrival  under  normd 
circumstances.  The  messages  in  a  synchronous  communication  channel  are  delivered  in 
FIFO  order. 

•  Asynchronous  Communication.  Asynchronous  communication  may  be  used  for 
message  passing  between  elemental  units  not  belonging  to  the  same  job.  It  may  also  be 
used  between  real-time  and  non-real-time  jobs.  In  such  communication,  neither  the 
sender  nor  the  receiver  is  blocked  (i.e.,  there  is  no  synchronization).  Since  the  sender 
and  receiver  may  execute  at  different  rates,  it  is  possible  that  no  finite  amount  of  buffers 
suffice.  Hence,  an  asynchronous  communication  channel  is  inherently  lossy.  The 
receiver  may  specify  its  input  port  to  be  inFirst  or  ioLast  to  indicate  which  messages  to 
drop  when  Ae  buffers  are  full.  The  first  message  is  dropped  in  an  inLast  channel,  while 
the  last  message  is  dropped  in  an  inFirst  channel. 

There  may  be  multiple  receivers  of  a  message,  thus  allowing  for  multi-cast  messages. 
Similar  to  a  one-to-one  channel,  a  multicast  channel  may  also  be  synchronous  or 
asynchronous.  All  receivers  of  a  multi-cast  message  must  be  of  the  same  type. 

•  Shared  Memory.  Shared  memory  is  also  supported  in  Maruti.  The  simplest  way  to 
share  memory  between  EUs  is  to  allow  tiiem  to  exist  within  the  same  address  space. 
We  use  task  abstraction  for  this  purpose.  A  task  consists  of  multiple  threads  operating 
within  it,  sharing  the  address  space.  The  task  serves  as  an  execution  environment  for 
the  component  threads.  A  thread  may  belong  to  only  one  task.  In  addition  to  the  shared 
memory  within  a  task,  inter-task  sharing  is  also  supported  through  the  creation  of 
shared  memory  partitions.  A  shared  memory  partition  is  a  shared  buffer  which  can  be 
accessed  by  any  EU  permitted  to  do  so.  The  shared  memory  partitions  provide  an 
efficient  way  to  access  data  shared  between  multiple  EUs.  The  shared  memory 
communication  paradigm  provides  just  the  shared  memory  -  it  is  the  user's 
responsibility  to  ensure  safe  access  to  the  shared  data.  This  can  be  done  by  defining  a 
logical  resource  and  ensuring  that  the  resource  is  acquired  every  time  the  shared  data  is 
accessed.  By  providing  appropriate  restrictions  on  the  logical  resource,  safe  access  to 
data  can  be  ensured. 

5.5.  Resource  Model 

A  distributed  system  consists  of  a  collection  of  autonomous  processing  nodes  cormected 
via  a  local  area  network.  Each  processing  node  has  resources  classified  as  processors, 
logical  resources,  and  peripheral  devices.  Logical  resources  are  used  to  provide  safe  access 
to  shared  datastractures  and  are  passive  in  nature.  The  peripheral  devices  include  sensors 
and  acmators.  Restrictions  may  be  placed  on  the  preemptability  of  resources  to  maintain 
resource  consistency.  The  type  of  the  resource  determines  the  restrictions  that  are  placed  on 
the  preemptability  of  the  resource  and  serves  to  identify  operational  constraints  for  the 
purpose  of  resource  allocation  and  scheduling.  We  classify  the  resources  into  the  following 
types  based  on  the  restrictions  that  are  imposed  on  their  usage. 
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•  Non-preemptable.  The  inherent  characteristics  of  a  resource  may  be  such  that  it 
prevents  preemptability,  i.e.,  any  usage  of  the  resource  must  not  be  preempted.  Many 
devices  require  non-preemptive  scheduling.  For  resources  which  require  the  use  of 
CPU,  this  implies  non-preemptive  execution  from  the  time  the  resource  is  acquired  until 
the  time  the  resource  is  released. 

•  Exclusive.  Unlike  a  non-preemptive  resource,  an  exclusive  resource  can  be  preempted. 
However,  the  resource  may  not  be  granted  to  anyone  else  in  the  meantime.  A  critical 
section  is  an  example  of  a  resource  which  must  be  used  in  exclusive  mode. 

•  Serially  Reusable.  A  serially  reusable  resource  can  not  only  be  preempted  but  may 
also  be  granted  to  another  EU.  The  state  of  such  resources  can  be  preserved  and 
restored  when  the  resource  is  granted  back. 

•  Shared.  A  shared  resomce  may  be  used  by  multiple  entities  simultaneously.  In  a  single 
processor  system,  since  only  one  entity  is  executing  at  a  given  time,  there  is  no 
distinction  between  a  shared  resource  and  a  serially  reusable  resource. 

A  non-preemptable  resource  is  the  most  restrictive  and  a  shared  resource  is  the 
least  restrictive  in  terms  of  the  type  of  usage  allowed.  An  application  requesting  the  use  of  a 
resource  must  specify  when  the  resource  is  to  be  acquired,  when  it  is  to  be  released,  and 
the  restrictions  on  the  preemptability  of  the  resource.  The  resource  requirements  for 
applications  may  be  specified  at  different  levels  of  computational  abstractions  as  identified 
below. 

•  EU  level.  The  lowest  level  a  resource  requirement  can  be  specified  at  is  the  EU  level.  A 
resource  requirement  specified  at  the  EU  level  implies  that  the  resource  is  acquired  and 
released  within  the  EU.  For  scheduling  purposes,  it  is  assumed  that  the  resource  is 
required  for  the  entire  duration  of  the  execution  of  the  EU. 


•  Thread  Level.  Resource  specification  at  the  thread  level  is  used  for  resources  which 
are  acquired  and  released  by  different  EUs  belonging  to  the  same  thread.  For  instance, 
a  critical  section  may  be  acquired  in  one  EU  and  released  in  another  one. 

•  Job  Level.  Job-level  resource  specifications  are  used  to  specify  resources  which  are  not 
acquired  and  released  for  each  invocation  of  a  periodic  or  sporadic  job.  Instead,  these 
resources  are  acquired  at  the  job  initialization  and  released  at  job  termination.  For  a 
periodic  job,  an  implicit  resource  associated  with  each  thre^  are  the  thread  data 
structures  (including  procesor  stack  and  registers). 

5.6.  Operational  Constraints 

The  execution  of  EUs  is  constrained  through  various  kinds  of  operational  constraints.  Such 
constraints  may  arise  out  of  restricted  resource  usage  or  through  the  operational 
requirements  of  the  application.  Examples  of  such  constraints  are:  precedence,  mutual 
exclusion,  ready  time,  and  deadline.  They  may  be  classified  into  the  following  categories: 

•  Synchronization  Constraints.  Synchronization  constraints  arise  out  of  data  and 
control  dependencies  or  through  resource  preemption  restrictions.  Typical  examples  of 
such  constraints  are  precedence  and  mutual  exclusion. 
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•  Tiiiiing  Constraints.  Many  types  of  timing  constraints  may  be  specified  at  different 
levels,  i.e.,  at  job  level,  thread  level,  or  EU  level.  At  the  job  level,  one  may  specify  the 
ready  time,  deadline,  and  whether  the  job  is  periodic,  sporadic,  or  aperiodic.  For 
threes,  a  ready  time  and  deadline  may  be  specified  relative  to  the  job  arrival.  Likewise, 
a  ready  time  and  deadline  may  be  specified  for  an  individual  EU.  We  also  support  the 
notion  of  relative  timing  constraints,  i.e.,  constraints  on  the  temporal  distance  between 
the  execution  of  two  EUs. 

•  Allocation  Constraints.  In  our  model,  tasks  are  allocated  to  processing  nodes. 
Allocation  constraints  are  used  to  restrict  the  task  allocation  decisions.  Allocation 
constraints  often  arise  due  to  fault-tolerance  requirements,  where  the  replicas  of  EUs 
must  be  allocated  on  different  processing  nodes.  Similarly,  when  two  tasks  share 
memory,  they  must  be  allocated  on  the  same  processing  node.  Sometimes  a  task  must 
be  bound  to  a  processing  node  since  it  uses  a  particular  resource  bound  to  the  node 
(e.g.,  a  sensor). 

The  operational  constraints  are  made  available  to  the  resource  allocation  and 
schedtihng  tools  which  must  ensure  that  the  allocation  and  scheduling  maintains  the 
restrictions  imposed  by  the  constraints.  The  model  does  not  place  any  a  priori  restrictions 
on  the  nature  of  the  constraints  that  may  be  specified.  However,  the  techniques  used  by  the 
resource  allocator  and  scheduler  \^’ill  depend  on  the  type  of  constraints  that  can  be 
specified. 

5.7.  Allocation  and  Scheduling 

After  the  application  program  has  been  analyzed  and  its  resource  requirements  and 
execution  constraints  identified,  it  can  be  allocated  and  scheduled  for  a  runtime  system. 

This  final  phase  of  program  development  depends  upon  the  physical  characteristics 
of  the  hardware  on  which  Ae  application  will  be  run,  for  example,  the  location  of  devices 
and  the  number  of  nodes  and  type  of  processors  on  each  node  in  the  distributed  system. 

Maruti  uses  time-based  scheduling  and  the  scheduler  creates  a  data  structure  called  a 
calendar  which  defines  the  execution  instances  in  time  for  aU  executable  components  of 
the  applications  to  be  run  concurrenliy. 

We  consider  the  static  allocation  and  scheduling  in  which  a  task  is  the  finest 
granularity  object  of  allocation  and  an  EU  instance  is  the  unit  of  scheduling.  In  order  to 
make  the  execution  of  instances  satisfy  the  specifications  and  meet  the  timing  constraints, 
we  consider  a  scheduling  frame  whose  length  is  the  least  common  multiple  of  aU  tasks' 
periods.  As  long  as  one  instance  of  each  EU  is  scheduled  in  each  period  within  the 
scheduling  frame  and  these  executions  meet  the  timing  constraints  a  feasible  schedule  is 
obtained. 

As  a  part  of  the  Maruti  development  effort,  a  number  of  scheduling  techniques  have 
been  developed  and  are  used  for  generating  schedules  and  calendars  for  task  sets.  These 
techniques  include  the  use  of  temporal  analysis  and  simulated  annealing.  Schedules  for 
single-processor  systems  as  well  as  multiple-processor  networks  are  developed  using  these 
techniques. 
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6.  Maruti  Runtime  System 

The  runtime  system  provides  the  conventional  functionality  of  an  operating  system  in  a 
manner  that  supports  the  timely  dispatching  of  jobs.  There  are  two  major  components  of 
the  runtime  system  -  the  Maruti  core,  which  is  the  operating  system  code  that  implements 
scheduling,  message  passing,  process  control,  thread  control,  and  low  level  hardware 
control,  and  the  runtime  dispatcher,  which  performs  resource  allocation  and  scheduling  for 
dynamic  arrivals. 

6.1.  The  Dispatcher 

The  dispatcher  carries  out  the  following  tasks: 

•  Resource  Management  The  dispatcher  handles  requests  to  load  applications.  This 
involves  creating  all  the  tasks  and  threads  of  the  application,  reserving  memory,  and 
loading  the  code  and  data  into  memory.  All  the  resources  are  reserved  before  an 
application  is  considered  successfully  loaded  and  ready  to  run. 


•  Calendar  Management.  The  dispatcher  creates  and  loads  the  calendars  used  by 
applications  and  activates  them  when  the  application  run  time  arrives.  The  application 
itself  can  activate  and  deactivate  calendars  for  scenario  changes. 

•  Connection  Management.  A  Marati  application  may  consist  of  many  different  tasks 
using  channels  for  communication.  The  dispatcher  sets  up  the  connections  between  the 
application  tasks  using  direct  shared  buffers  for  local  connections  or  a  shared  buffer 
with  a  communications  agent  for  remote  connections. 

•  Elxception  H^dling.  Rogue  application  threads  may  generate  exceptions  such  as 
missed  deadlines,  arithmetic  exceptions,  stack  overflows,  and  stray  accesses  to 
unreserved  memory.  These  exceptions  are  normally  handled  by  the  dispatcher  for  all 
the  Maruti  application  threads.  Various  exception  handling  behaviors  can  be 
configured,  from  terminating  the  entire  application  or  just  the  errant  thread,  to  simply 
invoking  a  task-specific  handler. 

6.2.  Core  Organization 

The  core  of  the  Maruti  hard  real-time  runtime  system  consists  of  three  data  structures: 

•  The  calendars  are  created  and  loaded  by  the  dispatcher.  Kernel  memory  is  reserved  for 
each  calendar  at  the  time  it  is  created.  Several  system  calls  serve  to  create,  delete, 
modify,  activate,  and  deactivate  calendars. 

•  The  results  table  holds  timing  and  status  results  for  the  execution  of  each  flftmpntai 
unit.  The  maruti_calendar_results  system  call  reports  these  results  back  up  to  the  user 
level,  usually  to  the  dispatcher.  The  dispatcher  can  then  keep  statistics  or  write  a  trace 
file. 

•  The  pending  activation  table  holds  all  outstanding  calendar  activation  and  deactivation 
requests.  Since  the  requests  can  come  before  the  switch  time,  the  kernel  must  track  the 
requests  and  execute  them  at  the  correct  time  in  the  correct  order. 
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The  scheduler  gains  control  of  the  CPU  at  every  clock  tick  interrupt.  At  that  time,  if 
a  Maruti  thread  is  currently  running  and  its  deadline  has  passed  its  execution  is  stopped  and 
an  exception  raised. 

If  any  pending  activations  are  due  to  be  executed  those  requests  are  handled, 
thereby  changing  the  set  of  active  calendars.  Then  the  next  calendar  entry  is  hecked  to  see  if 
it  is  scheduled  to  execute  at  this  time,  if  so,  the  scheduler  switches  immediately  to  the 
specified  thread.  If  no  hard  real-time  threads  are  scheduled  to  execute,  the  calendar 
scheduler  falls  through  to  the  soft  and  non-real-time,  priority-based  schedulers. 

Maruti  threads  indicate  to  the  scheduler  that  they  have  successfully  reached  the  end 
of  their  elemental  unit  with  the  maruti_unit_done  system  call.  This  call  marks  the  current 
calendar  entry  as  done  and  fills  in  the  time  actually  used  by  the  thread.  The  Marati  thread  is 
then  suspended  imtil  it  next  appears  in  the  calendars.  Soft  and  non-real-time  threads  can  be 
run  until  the  next  calendar  entry  is  scheduled  and  ate  executed  using  a  priority  based 
scheduling  for  the  available  time  slots. 

At  all  times  the  Maruti  scheduler  knows  which  calendar  entry  will  be  the  next  one  to 
run  so  that  the  calendars  are  not  continually  searched  for  work.  This  is  recalculated  when 
maruti_unit_done  is  called  or  whenever  the  set  of  active  calendars  changes. 

6.3.  Multiple  Scenarios 

The  Maruti  design  includes  the  concept  of  scenarios,  implemented  at  runtime  as  sets  of 
alternative  calendars  that  can  be  switched  quickly  to  handle  an  emergency  or  a  change  in 
operating  mode.  These  calendars  are  pre-scheduled  and  able  to  begin  execution  without 
having  to  invoke  any  user-level  machinery.  The  dispatcher  loads  the  initial  scenarios 
specified  by  the  application  and  activates  one  of  them  to  begin  normal  execution.  However, 
the  application  itself  can  activate  and  deactivate  scenarios.  For  example,  an  application 
might  need  to  respond  instantaneously  to  the  pressing  of  an  emergency  shutdown  button.  A 
single  system  call  then  causes  the  immediate  suspension  of  norm^  activity  and  the  rurming 
of  the  shutdown  code  sequence.  Calendar  activation  and  deactivation  commands  can  be 
issued  before  the  desired  switch  time.  The  requests  are  recorded  and  the  switches  occur  at 
the  precise  moment  specified.  This  allows  the  application  to  insure  smooth  transitions  at 
safe  points  in  the  execution. 


Figure  2:  Maruti  System  Architecture 
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7.  Maruti  3.1  System  Architecture 
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7.1.  Runtime  Enyironments 
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allowmg  temporal  debugging,  mcluding  single  stepping  the  real-time  calendars. 

The  Maru^ach  runtime  environment  is  a  modified  version  of  Mach  which  allows  the 
ru^g  of  real-time  Maruti  programs  within  the  Mach  environment,  where  the  real-time 
and  non-real-tune  task  can  co-exist  and  interact  in  the  same  host 
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^  ^-ti  core 


7»2.  MarutWirtual  Runtime  Environment 


Testing  real-time  programs  in  their  native  embedded  environments  can  be  tedious  and  very 
time-consmning  because  of  the  lack  of  debugging  facilities  ^d  the  reqimement  to  reM 
and  reboot  the  target  computer  every  time  a  change  is  made.  Maruti  provides  a  Unix-b^M 
runtime  system  that  allows  the  execution  of  Maruti  hard-red-time  apphcations  from  \wt^ 
the  Unix  development  environment.  This  Unix  execution  environment  supp 
following  features: 


•  The  Maruti  application  has  direct  control  of  its  I/O  device  hardware. 


Graphical  output  and  keyboard  input  can  go  either  to  the 
Maruti/Standalone  and  Mamti/Mach  environments,  or  appear 
Unix  workstation,  possibly  across  the  network. 


PC  console,  as  in  the 
in  an  X  window  on  any 


The  application  can  be  run  under  the  Unix  GNU  Debugger,  allowing  Ae  examin^on 
of  program  variables  and  stack  traces,  setting  of  breakpomts,  and  post-mortem 

analysis. 


Figure  3:  Maruti/Virtual  screen  running  in  the  development  environment 


•  The  application  has  access  to  Unix  s^dard  output  so  it  can  print  debug  and  status 
messages  to  the  interactive  session  while  running. 

•  The  Maruti  application  runs  in  virtual  real-time',  that  is,  it  sees  itself  running  in  hard- 
real-time  against  a  virtual  time  base. 
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•  The  vimal  toe  can  be  manipulated  through  the  runtime  system  for  temporal 

debugging.  Virtual  time  can  be  slowed  down  or  sped  up,  and  individual  units 

(EU)  or  whole  calendars  can  be  single-stepped  or  traced. 

7.3.  Maruti/Standalone  Real-Time  Environment 

•  Maruti/Standalone  provides  a  minimal  runtime  system  for  the  execution  of  a  Maruti 
application  on  the  bare  hardware.  The  stand-alone  environment  has  the  following 
attributes: 


•  The  stod-aloiie  version  of  an  application  is  bmlt  from  the  same  object  modules  as  are 
used  in  the  Unix  and  Maruti/Mach  execution  environments. 


•  All  the  modules  of  the  application  are  bound  with  only  those  routines  of  the  Maruti  core 
that  are  needed  into  one  executable,  suitable  for  booting  directly  or  converting  into 
ROM. 


•  The  application  has  complete  control  of  the  computer  hardware. 

•  The  application  runs  in  hard  real-toe  with  very  low  overhead  and  variability. 

•  The  minimal  Maruti/Standalone  core  library  currently  consists  of  about  16  KB  of  code 
and  16  KB  of  data. 


•  The  optional  Maruti  Distributed  Operation  support  (including  network  driver)  is  about 
14  KB  of  code  and  9  KB  of  data. 


•  The  optional  Maruti  graphics  package  currently  consists  of,  for  the  standard  VGA 
version,  10  KB  of  code  and  20  KB  of  data  (plus  150K  for  a  secondary  frame  buffer  for 
best  performance). 

7.4.  Maruti/Mach  Real-Time  Environment 

The  original  execution  environment  for  Maruti-2  was  a  modified  version  of  the  CMU  Mach 
3.0  kernel.  Maruti/Mach  is  potentially  useftil  in  hybrid  environments  in  which  the  real-time 
components  co-exist  with  Mach  and  Unix  processes  on  the  same  CPU.  Because  of 
preemptabihty  problems  in  CMU  Mach  we  will  not  be  distributing  Maruti/Mach  until  it  can 
be  rehosted  onto  OSFl/MK  real-time  kernels. 

The  Maruti/Mach  features  include  the  following: 

•  A  calendar-based  real-time  scheduler  has  been  added  to  the  CMU  Mach  3.0  kernel. 
This  scheduler  takes  precedence  over  the  existing  Mach  scheduler,  running  Maruti 
elemental  units  from  the  calendar  at  the  proper  release  toe. 


•  The  Maruti  application  and  most  of  the  runtime  system  run  as  normal  Mach  user-level 
tasks  and  threads,  which  are  wired  down  in  memory. 
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•  The  Maruti  application  may  communicate  with  non-Maruti  Unix  and  Mach  processes 
through  shared  memory. 

•  The  Maruti/Mach  kernel  maintains  runtime  information  for  each  elemental  unit 
executed,  and  makes  that  information  available  to  the  user-level  code  for  worst-case 
computation  time  analysis. 


•  Parts  of  the  CMU  Mach  kernel  remain  unpreemptable.  Nevertheless,  on  a  dedicated 
system  we  can  achieve  release  time  variability  of  about  100  microseconds.  The  context 
switch  time  is  about  200  microseconds. 


•  The  new  release  of  OSF  Research  Institute  Mach  MK6.0  addresses  most  of  the  Mach 
kernel  preemptability  concerns.  We  will  be  porting  Maruti/Mach  to  this  base  in  the  near 
future. 


8.  Future  Directions 

The  Maruti  Project  is  an  ongoing  research  effort.  We  hope  to  extend  the  current  system  in 
a  number  of  possible  directions.  Of  course,  since  this  is  a  research  project,  we  ejq)ect  our 
ideas  to  evolve  over  time  as  we  gain  experience  and  get  feedback  from  users. 

8.1.  Scheduling  and  Analysis  Extensions 

Preemptable  Scheduling  of  Hard-Real-Time  Tasks 

We  are  planning  to  extend  our  scheduling  approach  to  incorporate  controlled  preemptions 
of  tasks.  To  date  we  have  concentrated  on  using  non-preemptable  executions  of  tasks, 
which  simplifies  scheduling  and  eases  exclusion  problems  in  application  development 
However,  the  non-preemptability  assumption  to  exclusion  is  not  scalable  to  a 
multiprocessor,  as  threads  running  on  different  processors  can  interfere  with  each  other. 
Controlled  preemption  is  more  powerful,  as  it  allows  scheduling  of  long-running  tasks 
concurrently  with  high  frequency  tasks.  Preemption  will  remain  under  the  control  of  the 
application. 

Language  support  for  atomic  actions  will  be  developed  to  replace  the  assumption  of 
non-preemptable  EU's.  Action  statements  will  serve  to  delineate  sections  of  code  on  which 
precise  timing  requirements  can  be  imposed  by  the  application  designer.  Combined  with 
critical  region  statements  (already  implemented),  actions  wiQ  allow  the  prograimner  to 
specify  precisely  the  desired  timing  and  resource  interrelationships  in  a  maimer  that  is 
scalable  to  a  multiprocessor  or  network  cluster,  unlike  the  non-preemptability  assumption. 

We  wiU  extend  the  Maruti  run-time  system  to  handle  preemptable  hard  r^-time 
tasks.  This  wiU  be  done  in  coordination  with  the  analysis  tools  which  wiU  generate 
multiple  calendar  entries  for  the  preempted  EUs.  All  but  the  last  entry  for  the  EU  wiU  be 
marked  as  preemptable,  and  all  but  the  first  will  be  marked  as  continuation  entries.  This  is 
enough  information  for  the  run-time  scheduler  to  correctly  handle  the  preemption  in  a 
controlled  manner,  even  when  the  EU  completes  early. 
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Integration  of  Tinie-based  and  Priority-Based  Scheduling 

We  plan  to  integrate  the  time-based  and  priority-based  scheduling  in  a  single  framework. 
To  date  we  have  concentrated  on  time-based  scheduling  only.  To  support  other  scheduling 
paradigms  within  the  time-based  framework,  we  may  reserve  time  slots  in  the  schedule  and 
associate  a  queue  of  waiting  tasks  which  are  executed  on  the  basis  of  their  priorities.  In  this 
way  we  can  implement  rate-monotonic  style  static  priority  schemes  as  well  as  Earliest- 
Deadline-First  style  dynamic  priority  schemes  within  the  Maruti  framework.  However,  in 
order  to  assure  that  the  tasks  executed  under  priority-based  scheduling  will  continue  to  meet 
their  temporal  requirements,  extensions  to  the  analysis  techniques  are  required.  We  will 
develop  analysis  techniques  suitable  for  this  purpose. 

We  will  extend  the  Maruti  implementation  to  support  non-calendar  schedulers,  such 
as  priority  based  or  earliest-deadline-first  based  schedulers.  These  schedulers  will  run  in 
particular  slots  specified  in  the  Maruti  calendar,  or  when  the  calendar  is  idle. 

POSIX-RT  Subset  API 

In  a  related  area,  we  plan  to  study  the  use  of  a  subset  of  the  POSK  API  as  the  Maruti  API 
for  soft  and  non-real-time  tasks.  We  will  implement  as  much  of  the  POSIX-RT  API  as  is 
appropriate  and  practicable. 

Asynchronous  Events 

Generally,  in  a  time-based  system,  events  are  polled  for  at  the  maximum  frequency  at 
which  they  are  expected.  This  tyj^  of  event  handling  is  easy  to  analyze  within  the  time- 
based  framework,  and  makes  explicit  the  need  to  reserve  enough  time  to  handle  the  event 
stream  at  its  worst-case  arrival  rate.  At  this  worst-case  rate,  polling  is  mote  efficient  than 
interrupt-driven  event  handling  because  the  interrupt  overhead  is  avoided.  However,  at  low 
event  rates,  polling  is  less  efficient  and  fragments  the  cpu  idle  time  (where  we  de^e  idle 
time  from  the  point  of  view  of  hard  real-time  tasks).  While  conservation  of  idle  time  is  not 
an  issue  for  small  controllers,  it  becomes  very  important  when  there  are  soft-  and  non-real- 
time  tasks  running  in  the  system. 

Currently,  Maruti  takes  the  polling  approach  to  ease  analysis  and  to  better  handle 
the  worst  case  rate.  We  plan  to  study  the  andysis  required  to  accommodate  asynchronous 
everits  within  a  calendar  schedule.  Our  intended  approach  is  to  work  with  a  specified 
maximum  frequency,  relative  deadline,  and  computation  time  of  the  asynchronous  event, 
and  to  reserve  enough  time  in  the  calendar  for  the  event  to  occur  at  its  maximum  frequency. 

We  will  extend  the  Maruti  run-time  system  to  register  and  dispatch  event  handlers  in 
response  to  external  events.  Included  in  this  extension  wiU  be  the  ability  to  detect  and 
appropriately  handle  overload  conditions  (i.e.  when  the  events  occur  more  quickly  that 
expected). 

Multi-Dimensional  Resource  Scheduling  Research 

A  typic^  real-time  ^plication  requires  several  resources  for  it  to  execute.  While  CPU  is  the 
most  critical  resource,  others  have  to  be  made  available  in  a  timely  manner.  Generation  of 
schedules  for  multiple  resources  is  known  to  be  a  difficult  problem.  Our  approach  to  date 
has  been  to  develop  efficient  search  techniques,  such  as  one  based  on  simulated  annealing. 

Realistic  problems  contain  a  variety  of  interdependencies  among  tasks  which  must 
be  reflected  as  constraints  in  scheduling.  We  plan  to  develop  efficient  techniques  for 
scheduling  the  allocation  and  deallocation  of  portions  of  multidimensional  resources.  In 
particular,  we  will  address  the  problems  of  allocation  and  management  of  resources  such  as 
memory  and  disk  space,  that  can  accommodate  many  entities  simultaneously. 
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Scheduling  System-Specific  Topologies 

In  a  related  area,  many  communications  networks  have  more  complex  structures  than  a 
simple  bus  and  cannot  be  treated  as  a  single  dedicated  resource.  We  will  study  the 
extension  of  our  scheduling  algorithms  to  support  point-to-point  meshes  of  nodes  (with 
store-and-forward  of  messages),  switched  networks  (such  as  MyriNet),  and  sophisticated 
backplanes  such  as  that  used  in  the  Intel  Paragon. 

We  will  investigate  the  use  of  a  general  framework  for  specifying  the  properties  of 
connection  topologies  to  the  Maruti  scheduler.  In  the  worst  cases,  the  schedider  for  a 
complex  inteicoimection  technology  may  have  to  be  programmed  explicitly.  To  handle 
such  cases,  we  will  develop  a  modular  interface  into  our  allocator/scheduler  into  which 
such  baclq)lane-specific  schedulers  can  be  plugged. 

Static  Estimation  of  Execution  Times 

Currently,  execution  times  are  derived  through  extensive  testing  of  the  program  on  the 
target  hardware  environment.  Deriving  the  execution  time  through  static  analysis  is 
hampered  by  the  data  dependencies  present  in  large  number  in  most  programs. 

We  will  investigate  the  use  of  static  analysis  to  help  prove  the  execution  time  limits 
of  programs.  While  generating  a  reasonable  computation  time  estimate  through  static 
analysis  is  not  feasible  in  general,  it  is  possible  to  get  accurate  results  for  large  segments  of 
a  program,  and  to  clearly  identify  the  existing  data  dependencies  so  that  the  programmer 
can-through  program  modifications  or  directives  to  the  analysis  tool-ehminate,  curtail,  or 
characterize  tiie  data  dependencies  well  enough  to  get  very  useful  verification  of  the  time 
properties  of  the  program. 

Temporal  Debugging 

When  we  develop  real-time  applications  we  need  techniques  for  observing  the  temporal 
behavior  of  programs.  For  their  functional  characteristics  we  can  use  standard  debuggers 
which  permit  the  observation  of  the  state  of  execution  at  any  stage.  This,  however, 
destroys  the  temporal  relationships  completely.  In  MarutiWirtual  we  provide  the  facilities  of 
controlling  the  execution  of  all  parts  of  an  application  with  respect  to  a  virtual  time  which 
advances  tmder  the  control  of  keyboard  directives.  Thus  we  can  pause  the  execution  at  any 
virtual  time  instant  with  the  assurance  that  all  temporal  relationships  with  respect  to  this 
instant  are  accurately  reflected  in  the  state  of  the  program.  We  use  the  term  temporal 
debugging  for  this. 

We  win  conduct  research  on  the  theoretical  aspects  of  the  issues  of  temporal 
debugging  and  consider  the  implications  of  temporal  debugging.  In  particular,  we  will 
smdy  how  the  interactions  of  programs  executing  in  virtual  time  with  external  events  which 
occxir  with  respect  to  their  own  time  line  should  be  captured  in  temporal  debugging.  We 
will  also  study  how  the  virtual  times  of  several  nodes  in  a  distributed  environment  should 
be  coordinated. 

We  will  extend  our  implementation  of  temporal  debugging  tools  in  the 
MarutiA^irtual  enviromnent  to  support  temporal  debugging  of  distributed  programs,  and  to 
support  fine  grained  modification  of  the  time  line. 
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Dynamic  Schedule  Generation 


We  will  develop  the  notion  of  time  horizons  to  support  controlled  modifications  of  the  hard 
re^-time  calendars  at  runtime  to  support  programs  that  generate  schedules  dynamically 
While  the  run-time  mechanisms  for  modifying  the  calendars  are  already  implemented, 
research  issues  relating  to  finding  safe  points  to  switch  schedules,  and  scheduling  the 
schedulers  themselves,  have  to  be  studied  before  effective  use  can  be  made  of  on-line 
calendar  generation. 

8.2.  Fault  Tolerance 


Maniti  currently  supports  several  powerful  mechanisms  for  building  fault  tolerant 
applications: 


•  Maruti  Configuration  Language  (MCL)  constructs  allow  the  application  designer  to 
specify  replication  of  application  subsystems  with  forkers  and  joiners  inserted  into  the 
commumcation  streams,  as  well  as  Ae  allocation  constraints  necessary  to  correctly 
partition  the  replicated  subsystems  for  the  desired  level  of  fault  tolerance. 


•  Maruti  Programmmg  Language  (MPL)  allows  the  programming  of  application  specific 
fault  tolerance  components  such  as  forkers  and  joiners,  elemental  unit  monitors,  and 
channel  monitors. 


•  The  run-time  system  supports  multiple  calendars,  allowing  the  application  to  switch  to 
emergency  or  fault  handling  scenarios  in  real  time. 

We  plan  to  extend  the  existing  mechanisms  by  providing  tools  and  new 
mechanisms  to  better  automate  the  process  of  building  fault  tolerant  applications.  The  new 
features  wiU  include: 


•  A  library  of  forkers  and  joiners  that  can  be  incorporated  into  applications. 


•  Support  for  multicast  messages. 


•  Better  support  in  Maruti  Programming  Language  (MPL)  for  EU  and  channel  monitors. 


•  Automatic  replication  of  subsystems,  and  analysis  of  fault  tolerance  properties  through 
MAGIC,  the  graphical  integrator  described  below. 

8.3.  Clock  Synchronization 

Currently,  distributed  Marati  handles  clock  drift  at  boot-iq)  time,  and  thereafter  time  slave 
nodes  simply  adopt  the  time-master's  clock  periodically.  This  scheme  is  suitable  for  many 
applications,  but  is  not  ideal  for  embedded  control  systems  that  will  suffer  from  a 
discontinuous  time  jump. 

To  address  this  problem  we  plan  to  develop  and  implement  time-synchronization 
algorithms  that  operate  concurrently  with  the  distributed  real-time  program  to  continually 
adjust  the  clocks  on  all  the  nodes,  taking  into  account  changes  in  their  relative  drift.  This 


401 


will  most  likely  involve  a  regular  time  pulse  from  a  master  clock,  from  which  the  other 
nodes  continually  measure  their  drift  and  fine-tune  their  tick  rates.  Since  the  clock  drifts  are 
about  one  order  of  magnitude  less  than  the  communication  latency  variances,  a  simple 
algorithm  will  not  suffice  here. 

8.4.  Heterogeneous  Operation 

We  will  extend  our  communications  agents  and  boot  protocol  to  translate  typed  Maruti 
messages  between  heterogeneous  hosts  when  needed.  The  off-line  Maruti  analysis  tools 
already  collect  information  on  the  types  of  the  channel  endpoints  for  type-checking  the 
coimection.  We  will  carry  this  information  through  to  the  run-time  system  for  use  in  those 
chaimels  that  are  coimected  between  heterogeneous  nodes. 

8.5.  MPL/Ada 

We  will  incorporate  Mamti  Programming  Language  (MPL)  features  and  analysis  into  the 
Ada  95  programming  language  as  we  did  for  ANSI  C  in  the  current  MPL,  which  we  will 
now  refer  to  as  MPL/C.  Implementing  MPL/Ada  will  involve  the  following  tasks: 

•  A  detailed  design  review  studying  those  features  of  Ada  which  are  compatible  with 
Maruti  and  those  that  are  not,  and  how  best  to  proceed  with  the  implementation  of 
MPL/Ada. 

•  Port  GNU  Ada  (GNAT)  to  our  NetBSD  development  environment. 

•  Implement  as  much  of  the  Ada  run-time  environment  as  is  practicable  on  the  Maruti 
run-time. 

•  Install  hooks  into  GNAT  to  extract  the  resource  usage  information  we  need.  We  expect 
this  work  will  leverage  heavily  from  the  MPL/C  work,  as  GNAT  is  derived  from  the 
same  back-end  code  base  as  GNU  C. 

•  Develop  and  enforce  within  GNAT  those  restrictions  on  Ada  constructs  needed  in  order 
to  preserve  the  properties  needed  for  our  hard  real-time  analysis. 

•  Add  support  for  Maruti  primitives  to  the  language.  Some  Maruti  primitives  might  be 
implementable  directly  through  existing  Ada  facihties  and  thus  wiU  not  require  language 
extensions. 
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8.6.  Graphical  Tools 

Graphical  Program  Development  Tools 

Currently,  Maruti  applications  are  pulled  together  by  an  MCL  specification,  which  takes  the 
form  of  a  procedural  language  whose  primitive  operations  instantiate  and  bind  together  the 
parts  of  the  application.  This  type  of  specification  language  is  complete,  allowing  the 
specification  of  large,  complex  applications  connected  in  arbitrary  ways.  However,  such 
completeness  makes  MCL  relatively  low-level  and  tedious  to  program. 

We  are  developing  graphical  program  development  tools  which  allow  the 
application  designer  to  puU  together  the  modules  using  an  entirely  graphical  user  interface¬ 
avoiding  MCL  programming.  The  on-screen  representation  of  modules  can  be 
interconnected  with  channels  and  grouped  into  hierarchical  subsystems.  The  application 
designer  will  be  able  to  zoom  in  and  out  to  view  the  application  at  several  levels. 

The  graphical  environment  will  allow  both  the  integration  of  existing  modules  and 
the  development  of  the  interfaces  of  modules  that  have  not  yet  been  written.  The  tools  will 
generate  template  MPL  code  for  those  modules.  In  this  way  the  graphical  environment 
functions  as  a  design  tool  and  program  generator  as  well  as  an  integration  environment 

The  graphical  environment  will  have  fault-tolerance  analysis  built  into  it.  Single 
points  of  failure  will  be  identified  on-screen.  The  user  will  be  able  to  replicate  entire 
subsystems  at  once,  with  the  forker  and  joiner  modules  and  allocation  constraints 
introduced  into  the  application  automatically  by  the  system. 

This  graphical  style  of  application  integration  wiU  greatly  facilitate  the  building  and 
deployment  of  reusable  software  components  modules  built  to  be  easily  customized  and 
reintegrated  into  many  applications.  Given  a  suitable  library  of  reusable  component 
modules  and  the  graphical  integrator,  it  will  be  possible  for  non-programmers  to  build  large 
custom  applications  from  these  parts. 
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Figure  5:  Prototype  Graphical  Resource  Scheduling  Tool 
Graphical  Resource  Management  Tools 

Along  with  the  graphical  software  development  tools,  we  are  pursuing  graphical  resource 
management  tools.  These  are  a  non-programmer's  interface  into  the  advanced  Maruti 
scheduling  technology.  The  Maruti  allocator/scheduler  works  with  the  abstract  concepts  of 
schedulable  entities,  available  resotirces,  and  various  types  of  constraints  on  the  placement 
of  entities  and  resources.  In  the  Maruti  operating  system,  the  scheduling  entities  are  EUs, 
and  the  resources  are  CPUs,  network,  memory,  and  devices  -  but  in  fact  any  type  of  entity 
or  resource  can  be  manipulated  by  the  allocator/scheduler. 

A  graphical  resource  management  tool  will  allow  the  specification  of  these  entities, 
resources,  and  constraints  on  screen  in  a  way  more  oriented  towards  the  general  user.  With 
this  tool  users  should  be  able  to  use  Maruti  scheduling  technology  to  schedule  classes, 
busses,  or  projects,  for  example. 

We  have  built  a  small  prototype  of  the  graphical  resource  manager.  The  prototype 
displays  the  EU  graph  input  to  the  scheduler  as  well  as  the  calendar  ou^ut  of  the  schedider. 
The  user  can  edit  the  EU  graph  and  its  constraints  and  reschedule  with  the  click  of  a  button. 
The  resulting  resource  calendar  is  redisplayed. 


9.  Availability 

We  are  pleased  to  annotmce  the  availability  of  the  Maruti  3.1  Hard  Real-Time  Operating 
System  and  Development  Environment. 

With  Maruti  3.1,  we  are  entering  a  new  phase  of  our  project.  We  have  an  operating 
system  suitable  for  field  use  by  a  wider  range  of  users,  and  we  are  embarking  on  the 
integration  of  our  time-based,  hard  real-time  technology  with  industry  standards  and  more 
traditional  event-based  soft-  and  non-real-time  systems.  For  this,  we  are  greatly  interested 
in  the  feedback  from  users  as  to  the  direction  of  evolution  of  the  system. 


For  the  Maruti  3.1  project,  we  will  be  pursuing  the  integration  of  a  POSIX  interface 
for  soft  and  non-real-tune  applications,  the  use  of  Ada  for  Maruti  programming,  support 
for  asynchronous  events  and  sofl/non-real  time  schedulers  withm  the  time-based 
framework,  and  heterogeneous  Maruti  networks. 

For  this  user-oriented  phase  of  the  project  we  will  be  making  regular  releases  of  our 
software  available  to  allow  interested  parties  to  track  and  influence  our  development.  To 
begin  this  phase  we  are  making  our  current  base  hard  real-time  operating  system  and  its 
development  environment  available.  This  is  an  initial  test  release. 

.  Maruti  3.1  will  be  made  available  to  interested  parties  on  request,  via  hitemet  ftp. 
Please  serid  el^tronic  mail  to  maruti-dist@cs.umd.edu  for  details.  More  information  about 
the  Maruti  Project,  as  well  as  papers  and  documentation,  are  available  via  the  World  Wide 
Web  at: 


http  ://www.cs.umd.edu/projects/maruti/l 

9.1.  Runtime  System 

The  Maruti  3.1  embeddable  hard  real-time  runtime  system  for  distributed  and  single-node 
systems  includes  the  following  features: 

•  The  core  Maruti  runtime  system  is  small  - 16  JCB  code  for  the  single  node  core,  30  KB 
code  for  the  distributed  core. 


•  The  core  provides  a  calendar-based  scheduler,  threads,  distributed  message  passing 
using  Time  Division  Multiplexed  Access  (TDMA)  over  the  network,  and  tight  time 
synchronization  between  network  nodes. 

•  Also  included  in  the  runtime  system  is  a  graphics  library  suitable  for  system  monitoring 
displays  as  well  as  simulations. 


•  Maruti  runs  on  PC-AT  compatible  computers  using  the  Intel  1386  (with  1387 
coprocessor),  i486DX,  or  Pentium  processors.  Distributed  operation  currently  requires 
a  3Com  3c507  ethemet  card.  The  graphics  library  supports  standard  VGA  and  Tseng- 
Labs  ET-4000-based  Super-VGA.  Support  for  other  SVGA  chipsets  is  forthcoming 
soon. 

9.2.  Development  Environment 

Maruti  3.1  includes  a  complete  development  environment  for  distributed  embedded  hard 
real-time  applications.  The  development  envirorunent  runs  on  NetBSD  Unix  and  includes 
the  following: 

•  d  The  MarutiATrtual  debug^g  environment  -  simulates  the  Maruti  runtime  system 
within  the  development  environment.  The  system  clock  in  this  environment  tracks 
virtual  time,  which  can  be  sped  up,  slowed  down  in  relation  to  the  actual  time,  or 
single-stepped  or  stopped.  This  allows  temporal  debugging  of  the  application.  Within 
MarutiA^irtual  traces  of  the  application  scheduling  and  network  traffic  can  be  monitored 
in  the  debugging  session. 
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•  The  ANSI-C  based  Maruti  Programming  Language  (MPL/C).  MPL  adds  modules, 
message  passing  primitives,  shared  memory,  periodic  functions,  message-invoked 
functions,  and  exclusion  regions  to  ANSI  C.  MPL  is  processed  by  a  version  of  the 
GNU  C  compiler  which  has  been  modified  to  recognize  the  new  MPL  features,  and  to 
output  information  about  the  resources  used  by  the  MPL  program. 

•  The  Maruti  Configuration  Language  (MCL).  MCL  allows  the  system  designer  to 
specify  the  placemen^  timing  constraints,  and  interconnections  of  ah  the  modules  in  an 
application.  MCL  is  a  powerful  interpreted  C-like  language,  allowing  complex, 
Merarchical  configuration  specifications,  including  rephcation  of  components  and 
installation-site  specific  sizing  of  the  application.  The  MCL  processor  analyses  the 
application  graph  for  completeness,  and  type-checks  aU  connections. 

•  The  Maruti  Allocator/Scheduler.  The  Maruti  allocation  and  schedulingtool  analyses  the 
information  generated  by  the  MPL  compiler  and  the  MCL  integrator  to  find  an 
allocation  and  scheduling  of  the  tasks  of  a  distributed  apphcation  across  the  nodes  of  a 
Maruti  network.  All  relative  and  global  timing,  exclusion,  and  precedence  constraints 
are  taken  into  account  in  finding  a  schedule,  as  are  the  network  speed  and  scheduling 
parameters. 

•  The  Maruti  Timing  Trace  Analyzer.  The  Timing  Analyzer  calculates  worst-case 
computation  times  fi'om  timing  files  output  by  the  runtime  system.  Computation  times 
are  calculated  for  each  scheduling  unit  in  the  application,  and  these  times  can  be  fed 
back  into  the  Allocator/Scheduler  for  more  precise  scheduling  analysis. 

•  The  Maruti  Rimtime  Binder  (mbind).  One  of  the  features  of  Maruti  is  the  late  binding 
of  an  application  to  a  particular  runtime  system.  The  same  application  binaries  can  be 
combing  with  different  system  libraries  to  build  a  binary  customized  for  a  particular 
application  in  a  particular  setting.  Only  those  portions  of  the  system  library  needed  for 
that  binding  are  included.  Mbind  manages  this  final  step. 

•  The  Maruti  Application  Builder  (mbmld).  Mbuild  automates  the  process  of  building  an 
application  by  generating  for  the  programmer  a  customizable  makefile  that  manages  the 
complete  process  of  compiling,  configuring,  scheduling,  and  binding  an  application. 
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